Research means studying things we don’t yet understand. As Einstein put it, “If we knew what it was we were doing, it would not be called research, would it?” So how do you go about studying something you don’t understand? Science’s answer to this is to use a structured approach called the scientific method.
- HYPOTHESIS. We normally start from a “hypothesis”. This is somewhere between an idea and a theory. The hypothesis can come from another experiment that has thrown up some interesting facts or it can come purely from inspiration. For example, suppose I am visiting wildlife park and I notice some flamingos that are standing in a lake on one leg. I wonder if they do this to keep their feet dry and this now become my hypothesis.
- EXPERIMENT. The next step is to design an experiment to test whether the hypothesis is true. We have to design the experiment to take into account all sorts of things that could go wrong, and that is covered on the other pages on this website. Then we have to actually do the experiment.To test our hypothesis that flamingos stand on one leg to try to keep their feet dry, we might predict that if this is the reason for standing on one leg, then when they weren’t in a lake they would put their feet down on the ground. So we could design an experiment to test if flamingos accustomed to standing on one leg put both feet down when they are on dry land.
- INTERPRET RESULTS. Now we have to interpret the results to see if it supports the hypothesis. If it does, we have discovered something. But we also have to be critical of our own results – could anything have gone wrong? In the case of the flamingos, if they still stand on one leg when they are on dry land, we might conclude that the hypothesis was wrong – they are not trying to keep their feet dry after all, they stand on one leg for a different reason. But perhaps we now realise that flamingos spend so much time standing on one leg when they are the lake that they might have become used to it and so continue to do so when they are on dry land. This may be the reason they keep standing on one leg even when they are on dry land, so our experiment may not be telling us anything about their original reason for standing on one leg. We would have to be a little cautious about drawing any firm conclusions and really we should have designed a better experiment.
- REVIEW. The last step is to incorporate our new knowledge into our understanding of the subject we are studying.
Invariably, this throws up new questions, and we start again with a new hypothesis.
Not all science falls into this neat pattern. Sometimes we are just answering the question “I wonder what would happen if….” Sometimes we are addressing a technological challenge which will lead to useful advances along the way. Sending a man to the moon probably fits into this last category. Other experiments are done deliberately just to throw up ideas. Lastly, some science doesn’t involve laboratory based experimentation but instead depends on developing theories, creating computer simulations or from simply observing the world around us.
Science can also be broken down into basic science or applied science. In the former, we are trying to understand the way the world (or Universe!) around us works, whereas in applied science we are trying to achieve a specific objective. Understanding why flamingos stand on one leg is basic science, but trying to find a drug to treat influenza is applied science. The term “translational science” has recently become fashionable, and that refers to science that is done to “translate” the findings of basic research into a real world application.
All of these types of research are important. Without the basic research, we wouldn’t have the necessary knowledge to even start to do the translational or applied science. Although it may sound like translational and applied science are more useful, the basic science is really where the ideas come from in the first place, so we need all three.
Some scientists (particularly scientists involved in biological sciences) talk of “positive controls” (other scientists may call these a “reference” or a “standard”) and “negative controls”. The terms don’t make a lot of sense, until you understand what they mean and then it’s quite easy.
Examples from everyday life.
Positive controls. Have you ever bought a new car? Did you have a test drive first to get an idea of how the car performs? The test drive tells you the standard that you can expect. When you get your new car, it might not be the actual car you took on a test drive, but it should be the same model and so perform similarly. Now suppose you take delivery of your new car, and it doesn’t match up to the car you took on a test drive. Maybe it doesn’t accelerate as well, or some accessories are missing. You could reasonably go back to the showroom, point out the deficiencies and get your new car repaired, replaced or maybe even ask for your money back.
The test drive was your “positive control” – it set the standard, it showed you what should happen. If you hadn’t taken the test drive, you might not have realised that your new car was defective. That’s why positive controls are so useful – they tell you what to expect if things go well.
Negative controls. A negative control is the opposite of a positive control. It tells you what should happen if your experimental intervention does nothing. Suppose you have heard that adding grated beetroot to chocolate cake mix makes it tastes even better. So you head to the kitchen and bake a chocolate cake with beetroot in it and it tastes great! But, wait! How do you know it’s any better than your normal chocolate cake? The only way to test this is to bake a chocolate cake using your normal recipe – instead of adding beetroot you just use the regular ingredients. This is your “negative control” – it sets the standard if you do nothing to alter the recipe. So now you can compare the beetroot-enhanced cake with the normal one and see whether there really is a difference.
Scientific examples.
For scientists, positive controls are very helpful because it allows us to be sure that our experimental set-up is working properly. For example, suppose we want to test how well a new drug works and we have designed a laboratory test to do this. We test the drug and it works, but has it worked as well as well as it should? The only way to be sure is to compare it to another drug (the positive control) which we know works well. The positive control drug is also useful because it tells us our experimental equipment is working properly. If the new drug doesn’t work, we can rule out a problem with our equipment by showing that the positive control drug works.
The “negative-control” sets what we sometimes call the “baseline”. Suppose we are testing a new drug to kill bacteria (an antibiotic) and to do this we are going to count the number of bacteria that are still alive in a test tube after we add the drug. We could set up an experiment with three tubes.
- One tube could contain the drug we want to test.
- The second tube would contain our positive control (a different drug which we know will kill the bacteria)
- The last tube is our negative control – it contains a drug which we know has no effect on the bacteria. This tells us how many bacteria would be alive if we didn’t kill any of them.
If the new drug is working, there should be fewer cells left alive in the first tube compared to the last tube and ideally then number of cells still alive (if any) should be the same in the first and second tube.
So “controls” are important to scientists because it helps us validate the performance of our experimental set-up and tells us what effects we can reasonably expect to observe.
Sometimes it appears that scientists worry about all sorts of problems before actually doing an experiment, but there’s a reason for this. One of the things scientists often want to do is to make comparisons between two things, such as comparing which of two drugs is better for treating a particular disease. It is important that the experiment to test this is designed to allow a “fair comparison”. If you want to compare which of two children can run the fastest, you could set up a race, shout “Go” and see who crosses the finish line first. But if one child had a 20 meters head-start, it’s obvious that wouldn’t be a fair comparison.
Example from everyday life.
Consider a teacher who wants to test if a new way of teaching multiplication is better than the old way. The teacher could ask the class to split into two groups, teach one group the old way and the other group the new way. Afterwards, the teacher could give the students a test and see which group did better in the test.
That sounds reasonable, right?
But there’s some problems with this. What would happen if the students stayed with their friends when they divided into groups? Perhaps the students who are good at maths tend to be friends with one another, so when they split into groups, one group had more students who were good at maths than the other group. So now, even if both methods for teaching multiplication were equally good, one group might do better in the test simply because that group had better maths students. That could lead us to incorrectly conclude that one method of teaching multiplication was better than the other.
The solution to this problem is to “randomise” students to each of the groups – students would be assigned to each of the groups at random. And then it would be sensible to check that the “maths ability” of each group was the same. We could do this by looking at how well the students in each group did in their last maths test. We could work out the average results for both groups in the last maths test and if they were the same it would be reasonable to use these groups to test the new method for teaching multiplication.
By doing this, we are setting up a fair comparison.
We would have to worry about other things as well to make sure the comparison was fair. For example, the two groups should be taught by the same teacher, do the exam at the same time of day, in rooms with comparable amounts of noise and so on.
An example from Medicine.
Let’s return to our problem of comparing two medicines to treat a disease – is a new medicine better than the existing medicine? To do this, we could split patients at random into groups, give one group the old medicine and one group the new medicine and measure the effect of the drugs on the disease in the two groups.
We need to randomise the patients to each of the two groups. Some patients may be more sick than others, they may be different ages, sexes, body weights, races and so on. All of these factors could affect how well the drug works. So we would need to check if any factor which might affect how the drug works is equally represented in the two groups.
Is this really worth worrying about?
Well, imagine the patients were not randomly distributed between the two groups. Perhaps one group had patients who were more sick, and this group received the new drug. Those patients might not respond to the drug as well as the other group, simply because they were more sick in the first place. We might conclude that the new drug is less effective, even though in reality it might be even better than the old drug. As a result, we might not use a valuable new drug, preventing patients from being treated as well as possible,and wasting the millions of pounds that were spent on developing the drug.
So it’s important that when we make comparisons, we set them up as fairly as possible so the results tell us what we are really trying to find out. Another way of making comparisons fair involves a concept called “Ratios and normalisation” and we will discuss that on another page.
As surprising as it may seem, scientists are human too. If you think we are like Star Trek’s Mr Spock, you would be very wrong. Many scientists get excited about what they do, and we work long hours because we love the subject. However, it’s also how we make our living and we are judged on our research output. This is assessed by our ability to produce something new (new knowledge in academic science, a new product in industrial science). This creates a “conflict of interest” – scientists may feel pressure to get positive results because it directly affects their career. This can lead to scientist unintentionally (and on rare occasions, deliberately) interpreting data in favour of the result that is beneficial to them. To guard against these problems, there are some safeguards.
1. Blinding and Placebos
One common method used is called “Blinding”. A good example to understand this is to consider what happens when new drugs are tested in patients. Let’s suppose we want to test a new drug to treat depression. One source of bias may be introduced because the patient may feel better simply because they have been given something they believe will make them better, even if the drug is ineffective. Another source of bias is that the doctor assessing the effect of the drug, and who may be inclined to believe the drug has worked, even if it hasn’t. The solution to this is that when the drug is tested, some patients receive the drug, while others receive a “placebo” – a tablet that looks like the tablet containing the drug but which contains no active drug in it. A code is used to determine whether each patient gets the drug or the placebo but neither the patient nor doctor knows whether the patient has the drug or the placebo. This removes the bias because neither patient nor doctor knows what the patient has received, until the trial is completed and the results, which have already been registered, are decoded.
This type of analysis is often used in science when scientists have to use their judgement to interpret their results, rather than there being a clear measurement with laboratory instrumentation.
2. Improved data collection and analysis methods
Another way to remove bias is to remove subjective human judgement from the process entirely. For example, instead of looking at 2 sets of images and asking a scientist to judge if there is any difference between them, it is possible that computerised image analysis software can be used.
3. Making data accessible to all.
There is an emerging trend for scientists to make all their data widely available. Traditionally scientists have just published summaries of their data, but by making the underlying individual data available, it allows other scientists to re-examine how well the data has been analysed.
Sometimes it’s difficult to measure exactly what we want to measure so we measure something else as an alternative. On occasion, that leads to headlines in the national press that make scientists sound crazy, if its not reported accurately. For example, scientist recently developed animals which glow in the dark, by injecting DNA from jellyfish into them. It sounds crazy, doesn’t it? Why would scientists do such a thing? Are we trying to create some kind of Frankenstein-like monster?
Well, it’s not crazy, there is a good reason for it, and it highlight another fundamental scientific principle. To understand this, you need to know that DNA contains instructions for making proteins, and if that DNA is introduced into a cell, the cell will now start to make that protein.
Some drugs are large proteins, and these are difficult for us to make in a laboratory.
One solution to this is to make them in animals. As an example, ATryn is a drug which is used to prevent blood from clotting, and it is obtained from goats’ milk. The goats don’t normally make ATryn, but they have been genetically altered to do so. This is done by introducing into the goats the DNA that contains the instructions for making ATryn. The goat’s cells then make ATryn and it can then be purified from the goat’s milk.
To make drugs like this, we need methods to get the DNA into the animal’s cells in the first place. Similar methods may also be useful for fixing faulty genes in human patients. The problem,though, is that methods we have to do this at the moment are not very efficient and so scientists would like to find better methods.
That is where the jellyfish come in. Jellyfish just happen to make a fluorescent protein. The fluorescent protein gives off light when ultraviolet light is shined on to it (it “fluoresces”). So an easy way to test different methods for getting DNA into animals is to use the DNA from jellyfish that encodes the fluorescent protein. We can test different methods for introducing the DNA into the animals and see if were successful just by seeing if the animals fluoresce (“glow in the dark”). Once we have done that, we can then use the method we have developed to transfer DNA encoding the protein drugs. So the “glow in the dark” protein is just being used as an easy way to measure the success of the experiments. If you like it’s a “marker” for how well the experiment worked. Scientists very often use “markers” like this to make their work more efficient.
Sometimes we make use of “markers” because they are the only thing that is feasible to use. Suppose we want to develop a new drug to reduce cholesterol, because we think high levels of cholesterol can ultimately lead to death from heart attacks. What we would really like to do is to test the drug in patients and see if it reduces deaths from heart attacks. The problem with that approach is that it could take decades for high cholesterol to cause heart disease, so we might have to test the drug for more than 10 years to find out if it reduces the number of patients who suffer heart attacks. Can’t we do things more quickly? Well, we could test the drug and see if instead it reduces cholesterol in patients, in the expectation that lower cholesterol will result in fewer deaths from heart attacks. When we measure cholesterol in this way, it’s a “surrogate” for what we really want to measure (a drop in patient deaths).
It’s quite easy to understand what we mean if we say an event causes something to happen. For example, drinking too much alcohol and then driving a car can cause road traffic accidents. That is easy to understand because we have everyday knowledge of the effects of alcohol on driving. When we are doing research, however, we are starting from a position of limited knowledge about the subject we are studying (see the The Scientific Method). So how can we go about discovering if something causes something else. One way is to look for a correlation, in other words determining whether two things change in step with one another.
A famous example is the debate around global warming and carbon dioxide emissions. The correlation between global carbon dioxide levels and global warming can be seen here http://www.ncdc.noaa.gov/paleo/globalwarming/temperature-change.html
An example from everyday life.
Suppose we were interested in whether the increased use of computer games is a cause of obesity, because playing games cause people to lead less active life styles. We could look to see if there is data available for computer games sales for the past twenty years and the proportion of people who are obese. If we saw that games sales and obesity increased around the same time, could say we have identified a correlation.
But that wouldn’t necessarily mean that computer games cause obesity. There are several reasons why this correlation might exist even if playing computer games doesn’t cause obesity.
- The correlation might have occurred by pure chance. Sound unlikely? Suppose you notice that the amount of junk mail delivered varies from week to week. Now suppose you measure as many other weekly events as you can: how many times you saw a cat; how many times it rained; how many times you had to stop at a red light while driving; how many days you had toast for breakfast and so on. If you look at enough of these events, you are almost certain to find one that has the same pattern as the changes in junk mail delivery. In other words, if you look at enough things, correlations inevitably exist by pure chance. Look at this website http://www.tylervigen.com/spurious-correlations it lists hundreds of crazy correlations that have occurred by chance, but one event does not cause the other.
- The correlations might occur because of “confounding factors”. Here is an example to explain that. It has been suggested that there is a correlation between sharks attacking humans and ice cream consumption. No-one is suggesting that the sharks are chasing people who are swimming while eating ice cream. However, people tend to eat more ice cream in the summer and it’s in the summer that people tend to go swimming in the sea increasing the chance of shark attacks (compared to being on dry land). In other words the hot weather “creates” a correlation between ice cream consumption and shark attacks even though one does not cause the other.
- It’s difficult to infer the order of events from a correlation. Going back to our obesity and computer games example, we initially proposed that playing computer games leads to obesity. But isn’t the opposite also possible – that obesity, brought on by other factors, lead to people following less active hobbies such as playing computer games?
- Some correlations are “inevitable”. There is a striking correlation between marriage and divorce. All people who get divorced have been married!
To summarize this, here is a diagram which shows potential explanations why a correlation might be observed between two events, labelled A and B.
So hopefully you can see that a correlation is only a starting point for investigating something by designing an appropriate experiment.
So how could we test if computer games cause obesity? We could encourage people to play computer games every day for several months, and see if they become obese. Then we could take away their computer games and see if they lose weight. This is called a prospective trial – because we start with the hypothesis that computer games cause obesity and directly test what happens in the future when we carry out a specific intervention (eg making people play with their computers). This is much more powerful than looking at historical data, because we can repeat it as often as we like to rule out the correlation occurring by chance or because of confounding factors.
Now if you really want to think about this, remember one behaviour could reinforce the other. Playing computer games could promote obesity, which in turn leads to playing more computer games, more obesity and so on. Biological scientists would call this positive feedback.
Normalisation refers to a process scientists use where they measure one thing, and express it as a ratio (“divided by”) of something else to make comparisons easier. It’s really not as complicated as it sounds, and you are almost certainly already familiar with the idea without realising it.
An example from every-day life.
Suppose you want to compare the fuel efficiency of two cars (let’s call them car A and car B). Car A can drive for 400 miles on a full tank of fuel, but car B can only drive 350 miles on a full tank.
This doesn’t really tell us which car is most efficient because the cars may have different size fuel tanks. Maybe car B drove a shorter distance because it has a smaller fuel tank. We can fix this by calculating how far each car went per gallon (or litre) of fuel.
Car A drove 400 miles on a full tank, its tank holds 10 gallons so it did 400 miles/10 gallons = 40 miles per gallon
Car B drove 350 miles on a full tank, its tank holds 7 gallons so it did 350 miles/7 gallons =50 miles per gallon.
Car B is actually more fuel efficient.
We have “normalised” the distance travelled to the fuel capacity of the tank.
Some scientific examples.
Many drugs work by binding to something called a “receptor”, think of it as the thing the drug sticks to for it to have its effect. If we wanted to measure how many receptors there are for a particular drug in different organs of the body (eg kidney, liver, brain), just measuring the total number of receptors wouldn’t make much sense because organs are different sizes. Instead, we could “normalise” the number of receptors to the mass (weight) of the organ and instead state the number of receptors per gram of tissue. That way we could compare the number of receptors in different organs in a meaningful way.
This example is a little more complex. Enzymes are type of protein that make chemical reactions happen more quickly. For example, an enzyme called hexokinase speeds up a reaction in glucose is converted to another chemical called glucose 6 phosphate. We call glucose 6 phosphate the “product” of the reaction. We might want to measure how quickly the product is produced when the enzyme is present. One way we could do this is to measure the amount of product that is produced by the enzyme over 10 minutes. However, scientists in other laboratories might measure the reaction over 5 minutes instead. Not surprisingly, the amount of product formed will be different depending on the period over which we follow the reaction. One solution to this is to express the speed of the reaction as “amount of product formed per minute”, and we calculate this by dividing the amount of product formed by the duration of the reaction, in the same way as we expressed the fuel efficiency of the cars as miles per gallon.
However, the amount of product also depends on how much of the enzyme we add and different scientists may test different amounts. If we add more enzyme the reaction will go even faster and make more product. We can solve this by dividing the amount of product by the amount of enzyme we added. We can measure the amount (mass) of enzyme in milligrams, (1 milligram is 1/1000th of a gram). So we divide the amount of product by the amount of enzyme added, or “amount of product formed, per milligram of enzyme”
Now we need to combine both of those two ideas together so we can take account of the duration of the reaction and the amount of enzyme we used. We end up expressing this as “amount of product formed per minute per mg of protein” by dividing the amount of product formed by the duration of the reaction and also by the mass of enzyme added to the reaction. The advantage of this is that we can now compare different experiments where we might have used different assay duration or a different mass of enzyme.
So next time you are presented with data in expressed as “something per something”, you will know why.
Scientists worry a lot about the reliability of their measurements, because this can have a crucial effect on the success of an experiment.
An example from everyday life
Bill has decided he needs needs to lose weight and he wants to try a new diet. However, he isn’t convinced the diet will work. Bill decides to measure his weight before starting the diet, follow the diet for 2 weeks and then measure his weight again. Bill knows his weight can changes during the day (eg before and after eating, exercising and so on), so he decides to measure his weight 3 times during the day before the diet, and 3 times during the day after the diet.
Here are the results
Before diet: 93.9, 94.0, 94.1 kg (average = 94 kg)
After diet: 91.9, 92.0, 92.1 kg (average = 92 kg).
It certainly looks like Bill lost some weight, around 2 kg. All the readings of his weight on each day are quite close together. We could say that the data on each day isn’t very “scattered” because all the measurements on each day are quite close together, they are within 0.1 kg of the average. The scatter of the data is quite small, and there is a noticeable difference between the average readings before and after the diet, so Bill can be reasonably confident the weight loss is real.
But now imagine his results look like this instead.
Before diet: 93, 94, 95 kg (average =94 kg)
After diet: 91, 92, 93 kg (average = 92 kg).
This time, although the averages are still the same, the results on each day much more scattered, the individual readings are up to 1 kg away from the averages on each day. Can Bill still be sure he has really lost weight? Indeed, one reading before the diet was 93 kg, and one reading after the diet was 93 kg. If you looked at those figures only, you might think the diet hadn’t worked.
The point is that it’s harder to tell if Bill has really lost weight when the data are less consistent (more “scattered”). So when we look at scientific results, we have to look at the both the average values and the scatter of the data.
How can we tell if two sets of numbers are really different – did Bill really loose weight? If the difference between the averages is large, and the scatter of the data is small, it might be obvious. But in other cases, scientists use statistical tests to look at how big the difference is between the averages and how scattered the data is. The statistics allow us to calculate a probability (“P”) at the change in Bill’s weight is a real difference, rather than a difference that came about by pure chance because the data are so scattered. This might be written as “P<0.05” which means that the probability is less than 5% that the difference we observed came about by chance. One way to think about this is that if we conclude the difference in Bill’s weight is real, there is less than a 5% chance that we are wrong.
Scientists make a distinction between precision and accuracy. Precise results are ones in which all the readings are quite close together. In contrast, accurate results are ones which are close to the true value we are trying to measure. So results can be precise but inaccurate if the different measurements are very close together but a long way from the true value! Scientists often report their results with an estimate of the scatter of the data and sometimes refer to this as the “error” although it’s more properly called variance. The larger the “error”, the more scattered the data, and the less confidence we can have in the average value.
A Scientific example
What do scientists have to do if the difference between two groups is quite small and/or the data is quite scattered. The best solution is to improve the precision of our experiments, but that is not always possible.
Another solution to this is to test lots of samples. This increases the “power” of the statistics – by testing more samples it’s easier to detect small differences. This has real world implications that affect all of us! For example, if a pharmaceutical company is testing a new drug in patients, but its anticipated that the data will be very scattered, then the drug has to be tested in lots of patients before we can be sure any effect of the drug is real. But if the drug is tested in lots of patients that makes the cost of testing the drug much higher and consequently the price we have to pay for the drug. So an apparently mundane subject such as statistics can affect whether we can afford a new drug!
If you talk to a scientist, you might think that they can’t give you a straight answer. For example, it’s very unlikely if you ask a scientist “is this drug safe”, you won’t get a simple “yes”, it will probably be accompanied by some qualifying statements. And although this might irritate you, there are many good reasons for it. Some of the reasons scientists are cautious in the way they present their findings are presented below.
Truths aren’t always universal
Somethings may be true in some circumstances but not in others. For example, is it a good idea to do exercise and keep fit? Yes. Unless you are sick and need a period of rest to and recover. Returning to our original example, is it safe to take this drug? Yes, but not for all patients. So when scientists talk about their work, if it’s not presented as simply as you might like it’s not that we are being awkward, it’s probably that we are trying to be accurate.
Absence of evidence is not evidence of absence.
This complicated sentence means that even if you can’t detect something, it may still be there. For example, before the invention of the microscope we couldn’t see bacteria (these are really small, about 1/1000th of a millimetre). That didn’t mean they weren’t there, rather we lacked the right equipment. We refer to this as sensitivity – it’s a reflection of our ability to measure something that really is there. (A related term “specificity” is sometimes also used, this refers to our ability to conclude correctly something is absent, when it really isn’t present.)
Remember that question earlier on about whether a drug is safe or not? Well, we only truly know if a drug is safe by giving it to patients. Suppose we test the drug in 1000 patients and there are no problems. Does that means it’s safe? What if it causes severe problems (eg a heart attack) in 1 in 10,000 patients and so no problems were found when only 1000 patients were tested. Do you still think its safe? So how many patients would you have to test to prove with absolute certainty it’s always safe? The answer is we can never do that.
False-positives and false-negatives.
When we measure something, scientists are well aware that they can sometimes they get the wrong result. For example, John goes to his doctor to be tested to see if he has diabetes.
If he really does not have diabetes but the test is wrong and indicates he does have diabetes, we could call that a “false-positive” – the test was “false” in saying he had tested positive for diabetes.
On the other hand, if he really does have diabetes but the test is wrong and indicates that he does not have diabetes, we would call that a “false-negative” – the test was “false” in saying he had tested negative for diabetes.
When scientists design tests, they have to worry how often false-positives and false-negatives occur. If they are too frequent, the results of the test may not be trustworthy. This makes us cautious about interpreting results.
Looking carefully at the primary data – not taking peoples word for things.
Recently in the UK, the law was changed to allow fathers to take more time off work when their child was born. A survey found that only 1% of men had taken parental leave, suggesting that men weren’t taking up the opportunity. But then it transpired that of the men questioned in the survey only a small proportion of the men had actually had a child recently. Of course men who hadn’t had a child recently hadn’t taken parental leave recently, the survey was absurd. But that didn’t stop it from making national news and it could have gone on to influence government policy. So Looking closely at the “primary data” rather than relying on someone else’s interpretation can be helpful to overcome this problem.
What do numbers really mean?
Often when you hear someone quote a figure to support their argument, they are using it to make a point. And scientists are the just the same. However, we all need to stop and ask ourselves “what do these numbers really mean”?
Suppose that there is an activity that many of us do every day in the UK and I want to convince you that its safe. Perhaps I might tell you that less than 0.0027 % of the UK population die from it every year. Sounds pretty safe, right?
Now let’s do a little maths. There are around 64 million people in the UK and 0.0027 % of this is approximately 1700 people. So, in fact, 1700 people die from this activity every year in the UK. Does it still sound as safe, now we have looked at “what the numbers really mean”? What was the activity? Well, 1700 people died in the UK in 2013 in road traffic accidents.
In the same way, scientists have to think carefully about the implications of their data. For example, drinking alcohol can increase the risk of get breast cancer by approximately 10 %. But what does this figure really mean? Well, the risk of a woman getting breast cancer during her life is approximately 12% if she doesn’t drink alcohol. That means 12 in 100 women who do not drink alcohol are likely to develop breast cancer during their life. If a woman drinks alcohol the risk is increased by 10% of this value. 10% of 12 is approximately 1, so 1 more woman of those 100 women is likely to get breast cancer, in other words approximately 12 + 1 = 13 of 100 women who drink alcohol are likely to breast cancer.
In summary, 12 in 100 of women are likely to get breast cancer if they don’t drink alcohol and this risk rises to 13 in 100 if they do drink alcohol, and this represents approximately a 10% increase in risk. You can look at those figures and make your own mind up about your own alcohol consumption, but the sobering fact is that these figures mean that alcohol consumption contributes to around 5000 cases of breast cancer in the UK each year.
Over-interpreting data.
This refers to the idea that when a scientist interprets an observation, we shouldn’t over-complicate it too much. A great example of over-interpretation of data is an imaginary conversation between two astonomers in the 19th century (this example is adapted from Carl Sagan’s book “Cosmos”). It goes like this.
“There is absolutely nothing to see on Venus when I look with my telescope”
“I wonder why that is? Maybe the surface is obscured by clouds”
“That’s possible. What type of landscapes are covered by clouds?”
“Well, landscapes with lots of liquid on the surface”
“That’s right. What types of planets have lots of liquids on their surface?”
“Oh, I don’t know. Planets with seas, I guess, and maybe marshes.”
“Good point. What do we find in the sea or in marshland ?”
“Well, often there are fish, or seabirds, maybe some mammals that like water.”
Observation: there is nothing to see on Venus
Conclusion: Venus is teaming with wildlife.
Clearly that is nonsense and it’s hard to believe something so absurd would happen.
One solution to the problem is to keep our interpretation as simple as possible. This is sometimes referred to as “Occam’s razor” which essentially says when we are trying to interpret our observations, it’s probably safest to use the simplest explanation we can.
Another solution is to be very critical when we think about exactly it is what we have measured and realise it may not be what we hoped we were measuring. For example, if I decided to measure global warming by measuring the temperature in one location (eg outside my house) for a few months, that wouldn’t tell us about annual global temperatures, it would just give information about the temperature outside my house for the few months. There are lots of reasons why this may not reflect what is happening on the rest of the planet for the rest of the year. For example, maybe the house is in the shade of another building, maybe the months of the measurements was unusually cold compared to the rest of the year and so on. So it’s important to remember what we would like to measure and what we are actually measuring are not necessarily the same thing.
Thinking of other explanations for data.
Sometimes, the most obvious explanation of some data isn’t the correct one. For example, suppose we see an increase in the recorded incidence of a disease. The obvious explanation is that whatever is causing the disease has increased. But another explanation is that we have just become better at detecting the disease. Similarly, if you are in your house and look outside to notice an unusual amount of traffic congestion, what is going on? Where is the problem that is causing this? Is there a problem in the road ahead causing the traffic to tail back (the obvious explanation)? Or is there a problem somewhere else, perhaps a main road has been closed somewhere creating a diversion, which leads to an abnormally large volume of traffic is being diverted passed your house. In this case, the problem really isn’t in the road leading from your house, it’s the diversion causing lots of cars to be forced past your house.
Scepticism and Peer review.
All this tends to make scientists quite sceptical. Indeed, scepticism is even built into the scientific method. You will recall from the page on “The Scientific Method” that we start with a “hypothesis”. In fact, we start off with what is called the “null hypothesis” which essentially says that whatever our favourite theory is, it’s wrong! It’s up to us as scientists to gather data to prove that the null hypothesis is incorrect. If you like, we deliberately stack the odds against ourselves to help ensure we get it right in the end.
Add to that, scientists like nothing better than trying to poke holes in each other’s work. We are incredibly critical of each other’s work, hopefully in a non-personal way. Its up to us to convince our colleagues (peers) that our observations and conclusions are correct That is where peer review comes in. Before we publish our results, they are reviewed by our colleagues (“peers”) to see if our conclusions are justified. Peer review doesn’t guarantee this, but it goes someway to ensuring a reasonable standard is met.