Scientists worry a lot about the reliability of their measurements, because this can have a crucial effect on the success of an experiment.
An example from everyday life
Bill has decided he needs needs to lose weight and he wants to try a new diet. However, he isn’t convinced the diet will work. Bill decides to measure his weight before starting the diet, follow the diet for 2 weeks and then measure his weight again. Bill knows his weight can changes during the day (eg before and after eating, exercising and so on), so he decides to measure his weight 3 times during the day before the diet, and 3 times during the day after the diet.
Here are the results
Before diet: 93.9, 94.0, 94.1 kg (average = 94 kg)
After diet: 91.9, 92.0, 92.1 kg (average = 92 kg).
It certainly looks like Bill lost some weight, around 2 kg. All the readings of his weight on each day are quite close together. We could say that the data on each day isn’t very “scattered” because all the measurements on each day are quite close together, they are within 0.1 kg of the average. The scatter of the data is quite small, and there is a noticeable difference between the average readings before and after the diet, so Bill can be reasonably confident the weight loss is real.
But now imagine his results look like this instead.
Before diet: 93, 94, 95 kg (average =94 kg)
After diet: 91, 92, 93 kg (average = 92 kg).
This time, although the averages are still the same, the results on each day much more scattered, the individual readings are up to 1 kg away from the averages on each day. Can Bill still be sure he has really lost weight? Indeed, one reading before the diet was 93 kg, and one reading after the diet was 93 kg. If you looked at those figures only, you might think the diet hadn’t worked.
The point is that it’s harder to tell if Bill has really lost weight when the data are less consistent (more “scattered”). So when we look at scientific results, we have to look at the both the average values and the scatter of the data.
How can we tell if two sets of numbers are really different – did Bill really loose weight? If the difference between the averages is large, and the scatter of the data is small, it might be obvious. But in other cases, scientists use statistical tests to look at how big the difference is between the averages and how scattered the data is. The statistics allow us to calculate a probability (“P”) at the change in Bill’s weight is a real difference, rather than a difference that came about by pure chance because the data are so scattered. This might be written as “P<0.05” which means that the probability is less than 5% that the difference we observed came about by chance. One way to think about this is that if we conclude the difference in Bill’s weight is real, there is less than a 5% chance that we are wrong.
Scientists make a distinction between precision and accuracy. Precise results are ones in which all the readings are quite close together. In contrast, accurate results are ones which are close to the true value we are trying to measure. So results can be precise but inaccurate if the different measurements are very close together but a long way from the true value! Scientists often report their results with an estimate of the scatter of the data and sometimes refer to this as the “error” although it’s more properly called variance. The larger the “error”, the more scattered the data, and the less confidence we can have in the average value.
A Scientific example
What do scientists have to do if the difference between two groups is quite small and/or the data is quite scattered. The best solution is to improve the precision of our experiments, but that is not always possible.
Another solution to this is to test lots of samples. This increases the “power” of the statistics – by testing more samples it’s easier to detect small differences. This has real world implications that affect all of us! For example, if a pharmaceutical company is testing a new drug in patients, but its anticipated that the data will be very scattered, then the drug has to be tested in lots of patients before we can be sure any effect of the drug is real. But if the drug is tested in lots of patients that makes the cost of testing the drug much higher and consequently the price we have to pay for the drug. So an apparently mundane subject such as statistics can affect whether we can afford a new drug!