Sometimes it appears that scientists worry about all sorts of problems before actually doing an experiment, but there’s a reason for this. One of the things scientists often want to do is to make comparisons between two things, such as comparing which of two drugs is better for treating a particular disease. It is important that the experiment to test this is designed to allow a “fair comparison”. If you want to compare which of two children can run the fastest, you could set up a race, shout “Go” and see who crosses the finish line first. But if one child had a 20 meters head-start, it’s obvious that wouldn’t be a fair comparison.


Example from everyday life.

Consider a teacher who wants to test if a new way of teaching multiplication is better than the old way. The teacher could ask the class to split into two groups, teach one group the old way and the other group the new way. Afterwards, the teacher could give the students a test and see which group did better in the test.

That sounds reasonable, right?

But there’s some problems with this. What would happen if the students stayed with their friends when they divided into groups? Perhaps the students who are good at maths tend to be friends with one another, so when they split into groups, one group had more students who were good at maths than the other group. So now, even if both methods for teaching multiplication were equally good, one group might do better in the test simply because that group had better maths students. That could lead us to incorrectly conclude that one method of teaching multiplication was better than the other.

The solution to this problem is to “randomise” students to each of the groups – students would be assigned to each of the groups at random. And then it would be sensible to check that the “maths ability” of each group was the same. We could do this by looking at how well the students in each group did in their last maths test. We could work out the average results for both groups in the last maths test and if they were the same it would be reasonable to use these groups to test the new method for teaching multiplication.

By doing this, we are setting up a fair comparison.

We would have to worry about other things as well to make sure the comparison was fair. For example, the two groups should be taught by the same teacher, do the exam at the same time of day, in rooms with comparable amounts of noise and so on.


An example from Medicine.

Let’s return to our problem of comparing two medicines to treat a disease – is a new medicine better than the existing medicine? To do this, we could split patients at random into groups, give one group the old medicine and one group the new medicine and measure the effect of the drugs on the disease in the two groups.

We need to randomise the patients to each of the two groups. Some patients may be more sick than others, they may be different ages, sexes, body weights, races and so on. All of these factors could affect how well the drug works. So we would need to check if any factor which might affect how the drug works is equally represented in the two groups.

Is this really worth worrying about?

Well, imagine the patients were not randomly distributed between the two groups. Perhaps one group had patients who were more sick, and this group received the new drug. Those patients might not respond to the drug as well as the other group, simply because they were more sick in the first place. We might conclude that the new drug is less effective, even though in reality it might be even better than the old drug. As a result, we might not use a valuable new drug, preventing patients from being treated as well as possible,and wasting the millions of pounds that were spent on developing the drug.

So it’s important that when we make comparisons, we set them up as fairly as possible so the results tell us what we are really trying to find out. Another way of making comparisons fair involves a concept called “Ratios and normalisation” and we will discuss that on another page.



If you talk to a scientist, you might think that they can’t give you a straight answer. For example, it’s very unlikely if you ask a scientist “is this drug safe”, you won’t get a simple “yes”, it will probably be accompanied by some qualifying statements. And although this might irritate you, there are many good reasons for it. Some of the reasons scientists are cautious in the way they present their findings are presented below.


Truths aren’t always universal

Somethings may be true in some circumstances but not in others. For example, is it a good idea to do exercise and keep fit? Yes. Unless you are sick and need a period of rest to and recover. Returning to our original example, is it safe to take this drug? Yes, but not for all patients. So when scientists talk about their work, if it’s not presented as simply as you might like it’s not that we are being awkward, it’s probably that we are trying to be accurate.


Absence of evidence is not evidence of absence.

This complicated sentence means that even if you can’t detect something, it may still be there. For example, before the invention of the microscope we couldn’t see bacteria (these are really small, about 1/1000th of a millimetre). That didn’t mean they weren’t there, rather we lacked the right equipment. We refer to this as sensitivity – it’s a reflection of our ability to measure something that really is there. (A related term “specificity” is sometimes also used, this refers to our ability to conclude correctly something is absent, when it really isn’t present.)

Remember that question earlier on about whether a drug is safe or not? Well, we only truly know if a drug is safe by giving it to patients. Suppose we test the drug in 1000 patients and there are no problems. Does that means it’s safe? What if it causes severe problems (eg a heart attack) in 1 in 10,000 patients and so no problems were found when only 1000 patients were tested. Do you still think its safe? So how many patients would you have to test to prove with absolute certainty it’s always safe? The answer is we can never do that.


False-positives and false-negatives.

When we measure something, scientists are well aware that they can sometimes they get the wrong result. For example, John goes to his doctor to be tested to see if he has diabetes.

If he really does not have diabetes but the test is wrong and indicates he does have diabetes, we could call that a “false-positive” – the test was “false” in saying he had tested positive for diabetes.

On the other hand, if he really does have diabetes but the test is wrong and indicates that he does not have diabetes, we would call that a “false-negative” – the test was “false” in saying he had tested negative for diabetes.

When scientists design tests, they have to worry how often false-positives and false-negatives occur. If they are too frequent, the results of the test may not be trustworthy. This makes us cautious about interpreting results.


Looking carefully at the primary data – not taking peoples word for things.

Recently in the UK, the law was changed to allow fathers to take more time off work when their child was born. A survey found that only 1% of men had taken parental leave, suggesting that men weren’t taking up the opportunity. But then it transpired that of the men questioned in the survey only a small proportion of the men had actually had a child recently. Of course men who hadn’t had a child recently hadn’t taken parental leave recently, the survey was absurd. But that didn’t stop it from making national news and it could have gone on to influence government policy. So Looking closely at the “primary data” rather than relying on someone else’s interpretation can be helpful to overcome this problem.


What do numbers really mean?

Often when you hear someone quote a figure to support their argument, they are using it to make a point. And scientists are the just the same. However, we all need to stop and ask ourselves “what do these numbers really mean”?

Suppose that there is an activity that many of us do every day in the UK and I want to convince you that its safe. Perhaps I might tell you that less than 0.0027 % of the UK population die from it every year. Sounds pretty safe, right?

Now let’s do a little maths. There are around 64 million people in the UK and 0.0027 % of this is approximately 1700 people. So, in fact, 1700 people die from this activity every year in the UK. Does it still sound as safe, now we have looked at “what the numbers really mean”? What was the activity? Well, 1700 people died in the UK in 2013 in road traffic accidents.

In the same way, scientists have to think carefully about the implications of their data. For example, drinking alcohol can increase the risk of get breast cancer by approximately 10 %. But what does this figure really mean? Well, the risk of a woman getting breast cancer during her life is approximately 12% if she doesn’t drink alcohol. That means 12 in 100 women who do not drink alcohol are likely to develop breast cancer during their life. If a woman drinks alcohol the risk is increased by 10% of this value. 10% of 12 is approximately 1, so 1 more woman of those 100 women is likely to get breast cancer, in other words approximately 12 + 1 = 13 of 100 women who drink alcohol are likely to breast cancer.

In summary, 12 in 100 of women are likely to get breast cancer if they don’t drink alcohol and this risk rises to 13 in 100 if they do drink alcohol, and this represents approximately a 10% increase in risk. You can look at those figures and make your own mind up about your own alcohol consumption, but the sobering fact is that these figures mean that alcohol consumption contributes to around 5000 cases of breast cancer in the UK each year.


Over-interpreting data.

This refers to the idea that when a scientist interprets an observation, we shouldn’t over-complicate it too much. A great example of over-interpretation of data is an imaginary conversation between two astonomers in the 19th century (this example is adapted from Carl Sagan’s book “Cosmos”). It goes like this.

“There is absolutely nothing to see on Venus when I look with my telescope”

“I wonder why that is? Maybe the surface is obscured by clouds”

“That’s possible. What type of landscapes are covered by clouds?”

“Well, landscapes with lots of liquid on the surface”

“That’s right. What types of planets have lots of liquids on their surface?”

“Oh, I don’t know. Planets with seas, I guess, and maybe marshes.”

“Good point. What do we find in the sea or in marshland ?”

“Well, often there are fish, or seabirds, maybe some mammals that like water.”


Observation: there is nothing to see on Venus

Conclusion: Venus is teaming with wildlife.


Clearly that is nonsense and it’s hard to believe something so absurd would happen.

One solution to the problem is to keep our interpretation as simple as possible. This is sometimes referred to as “Occam’s razor” which essentially says when we are trying to interpret our observations, it’s probably safest to use the simplest explanation we can.


Another solution is to be very critical when we think about exactly it is what we have measured and realise it may not be what we hoped we were measuring. For example, if I decided to measure global warming by measuring the temperature in one location (eg outside my house) for a few months, that wouldn’t tell us about annual global temperatures, it would just give information about the temperature outside my house for the few months. There are lots of reasons why this may not reflect what is happening on the rest of the planet for the rest of the year. For example, maybe the house is in the shade of another building, maybe the months of the measurements was unusually cold compared to the rest of the year and so on. So it’s important to remember what we would like to measure and what we are actually measuring are not necessarily the same thing.


Thinking of other explanations for data.

Sometimes, the most obvious explanation of some data isn’t the correct one. For example, suppose we see an increase in the recorded incidence of a disease. The obvious explanation is that whatever is causing the disease has increased. But another explanation is that we have just become better at detecting the disease. Similarly, if you are in your house and look outside to notice an unusual amount of traffic congestion, what is going on? Where is the problem that is causing this? Is there a problem in the road ahead causing the traffic to tail back (the obvious explanation)? Or is there a problem somewhere else, perhaps a main road has been closed somewhere creating a diversion, which leads to an abnormally large volume of traffic is being diverted passed your house. In this case, the problem really isn’t in the road leading from your house, it’s the diversion causing lots of cars to be forced past your house.


Scepticism and Peer review.

All this tends to make scientists quite sceptical. Indeed, scepticism is even built into the scientific method. You will recall from the page on “The Scientific Method” that we start with a “hypothesis”. In fact, we start off with what is called the “null hypothesis” which essentially says that whatever our favourite theory is, it’s wrong! It’s up to us as scientists to gather data to prove that the null hypothesis is incorrect. If you like, we deliberately stack the odds against ourselves to help ensure we get it right in the end.

Add to that, scientists like nothing better than trying to poke holes in each other’s work. We are incredibly critical of each other’s work, hopefully in a non-personal way. Its up to us to convince our colleagues (peers) that our observations and conclusions are correct That is where peer review comes in. Before we publish our results, they are reviewed by our colleagues (“peers”) to see if our conclusions are justified. Peer review doesn’t guarantee this, but it goes someway to ensuring a reasonable standard is met.