Remember Henery Hawk, that little chicken hawk who was always pestering Foghorn Leghorn? That’s the image that should come to mind the next time you read of a little 12-person (or 20-rat, or 40-mouse) study claiming to have found a cure for cancer, wrinkles, or even boredom. It’s not that small studies are worthless (they’re not), it’s that they’re worthless for making Big Decisions about treatment, lifestyle, risk, or where to spend your money. The reason for this, in layman’s terms, is chance.
When a research study sets out to really prove or disprove something, it enlists hundreds or thousands of participants. This makes the study very, very expensive. A small army of health professionals, administrative staff, software engineers, and research assistants are needed to train, treat, and track that many participants. Why go to all that expense, all the grant writing and organizing and stress and strain? Because it’s the only way to rule out chance or, as epidemiologists like to call it, sampling error
.
What, exactly, is sampling error? There’s a superb example in the Harry Potter movies, in the form of Bertie Bott’s Every Flavour Beans. For example, at the end of Sorcerer’s Stone, Dumbledore admits that he stopped eating them after having once encountered a vomit-flavored bean. The thing with Every Flavour Beans is that they come in every flavor. So if you try just a few, you could easily conclude that (1) they’re the most delicious things on earth, or (2) they’re the most noxious substance under the sun, or (3) neither of the above. Because your sample is so small, all you would really be reporting is your personal experience, which can be very different from what the beans are really like. This is sampling error, and it’s why size matters. And the more important and complicated and ambiguous the question, the more size matters.
Sampling error is problematic enough when only one or two factors are involved (a vote for or against, a preference for toffee versus earwax flavor), but when a large number of variables are involved (think of the array of side effects that accompany most new drugs), then sample size deserves huge weight.
Instead of jelly beans, let’s say you’re trying to decide whether to take oral contraceptives. If you ask ten different women what side effects they experienced, you may well get ten different sets of side effects. Some of those answers will be common (weight gain, possibly), and others may have no overlap and range from minor (mild headaches) to severe (cancer). Given ten different answers, how do you decide what the risks truly are?
You need to expand your sample size. If you were to ask ten thousand women the same question, then your sample size would give you a far better sense of the probability of your developing any given symptom. If six thousand women out of those ten thousand experience a given side effect, you know something very different than if only three, or even six, of your ten friends experienced it.
Here’s a chart from the Pew Research Center, a non-partisan opinion polling agency. This chart is about sampling error in the context of opinion polls, not medicine, so you can ignore the specific numbers. What’s important here is the concept, and the trend. This chart is a diagram of why size matters.
Notice that the sample size, or number of participants, increases as the red line moves to the right. And, the red line drops (the margin of error decreases) as the red line moves the right. So, the bigger the study, the better the evidence.
Margin of Error
Margin of Error is the technical version of “size matters.” Margin of error means the percentage by which the results might vary from reality. A margin of error of 10% means that the study finding, whatever it was, is really “plus or minus 10%”. This can be a huge margin or a narrow margin, depending on the study finding.
For example, if your study showed a 10% risk of some outcome, with a 10% margin of error, it means the actual risk is not 10%, but somewhere between 0% and 20%. On the other hand, if the study found a 60% risk, then the actual risk is somewhere between 50% and 70% (the error, relative to the conclusion, is less of an issue, but still an issue).
So, margin of error is a handy short-hand measure of reliability (the quality of the study data, of which size is a big factor). And because of this, you can also think of it as a short-hand measure of significance (whether the study is useful for Big Decisions or only water cooler conversation).
In medical research, there’s a related concept called the “Confidence Interval” or “CI”. Like the margin of error, this is a loose measure of how reliable and significant the study conclusions are. Since CIs are one of the primary tools for evaluating the import of a medical study, and there are some interesting nuances involved, I’m going to devote a separate blog post to them. For now, bear in mind that, as with margin of error, smaller is better for confidence intervals, and that in general, the bigger the study, the smaller the confidence intervals.
Duration is Also Size
Finally, if the study involves a chronic condition like hypertension, or coronary artery disease, or cancer, or obesity, then the study has to be big in another way: Duration.
Duration is a huge issue for cancer preventative drugs or risks. Ditto weight loss products or sunblocks or supplements. It would be meaningless to put a large number of people on a diet or a new drug and watch them for only a few months. Even if the study population showed a benefit (or risk), and the effects could be tied closely to the drug or risk factor, there would be no way to know whether the effect was lasting. For most of us, lasting is what counts. Would you spend $10,000 on a pill that was going to help only for a little while? If “help” means prolonging your life, the answer might be yes. But if “help” means weak evidence of benefit in exchange for potentially serious side effects, the answer might well be no thanks!
So, the next time someone’s trying to get you all worked up over a new stay-young-forever concoction, or a cancer prevention pill, instead of tearing your hair or lining up at the clinic for the Pill-du-Jour, you can calmly say “Sorry, but your puny sampling size just doesn’t do it for me”.

2 Trackbacks
[...] number of small studies (remember Size Matters? this is early stuff) have found that miRNA in the blood of test subjects could detect breast [...]
[...] research, it explained that many studies were simply too small to result in definitive answers (see Size Matters). Others looked only for gross developmental changes, and failed to test for subtle changes in [...]