Merck Manual

Please confirm that you are not located inside the Russian Federation

honeypot link

The Science of Medicine


Oren Traub

, MD, PhD, Pacific Medical Centers

Last full review/revision Sep 2018| Content last modified Sep 2018
Click here for the Professional Version

Doctors have been treating people for many thousands of years. The earliest written description of medical treatment is from ancient Egypt and is over 3,500 years old. Even before that, healers and shamans were likely providing herbal and other remedies to the ill and injured. A few remedies, such as those used for some simple fractures and minor injuries, were effective. However, until very recently, many medical treatments did not work and some were actually harmful.

Two hundred years ago, common remedies for a wide range of disorders included cutting open a vein to remove a pint or more of blood and giving various toxic substances to produce vomiting or diarrhea—all dangerous to a sick or injured person. About 120 years ago, along with mention of some useful drugs such as aspirin and digitalis, The Manual mentioned cocaine as a treatment for alcoholism, arsenic and tobacco smoke as treatments for asthma, and sulfuric acid nasal spray as a treatment for colds. Doctors thought they were helping people. Of course, it is not fair to expect doctors in the past to have known what we know now, but why had doctors ever thought that tobacco smoke might benefit someone with asthma?

There were many reasons why doctors recommended ineffective (and sometimes harmful) treatments and why people accepted them:

  • Typically, there were no effective alternative treatments.

  • Doctors and sick people often prefer doing something to doing nothing.

  • People are comforted by turning problems over to an authority figure.

  • Doctors often provide much-needed support and reassurance.

Most importantly, however, doctors could not tell which treatments worked.

Treatment and recovery: Cause and effect?

If one event comes immediately before another, people naturally assume the first is the cause of the second. For example, if a person pushes an unmarked button on a wall and a nearby elevator door opens, the person naturally assumes that the button controls the elevator. The ability to make such connections between events is a key part of human intelligence and is responsible for much of our understanding of the world. However, people often see causal connections where none exist. That is why athletes might continue to wear the "lucky" socks they had on when they won a big game, or a student might insist on using the same "lucky" pencil to take exams.

This way of thinking is also why some ineffective medical treatments were thought to work. For example, if an ill person’s fever broke after the doctor drained a pint of blood or the shaman chanted a certain spell, then people naturally assumed those actions must have been what caused the fever to break. To the person desperately seeking relief, getting better was all the proof necessary. Unfortunately, the apparent cause-and-effect relationships observed in early medicine were rarely correct, but belief in them was enough to perpetuate centuries of ineffective remedies. How could this have happened?

People get better spontaneously. Unlike “sick” inanimate objects (such as a broken axe or a torn shirt), which remain damaged until repaired by someone, sick people often get well on their own (or despite their doctor’s care) if the body heals itself or the disease runs its course. Colds are gone in a week, migraine headaches typically last a day or two, and food poisoning symptoms may stop in 12 hours. Many people even recover from life-threatening disorders, such as a heart attack or pneumonia, without treatment. Symptoms of chronic diseases (such as asthma or sickle cell disease) come and go. Thus, many treatments may seem to be effective if given enough time, and any treatment given near the time of spontaneous recovery may seem dramatically effective.

The placebo effect may be responsible. Belief in the power of treatment is often enough to make people feel better. Although belief cannot cause an underlying disorder, such as a broken bone or diabetes, to disappear, people who believe they are receiving a strong, effective treatment very often feel better. Pain, nausea, weakness, and many other symptoms can diminish even if a pill contains no active ingredients and can be of no possible benefit, such as a sugar pill (termed a placebo). What counts is the belief.

An ineffective (or even harmful) treatment prescribed by a confident doctor to a trusting, hopeful person often results in remarkable improvement of symptoms. This improvement is termed the placebo effect. Thus, people might see an actual (not simply misperceived) benefit from a treatment that has had no real effect on the disease itself.

Why does it matter? Some people argue that the only important thing is whether a treatment makes people feel better. It does not matter whether the treatment actually “works,” that is, affects the underlying disease. This argument may be reasonable when the symptom is the problem, such as in many day-to-day aches and pains, or in illnesses such as colds, which always go away on their own. In such cases, doctors do sometimes prescribe treatments for their placebo effect. However, in any dangerous or potentially serious disorder, or when the treatment itself may cause side effects, it is important for doctors not to miss an opportunity to prescribe a treatment that really does work.

How Doctors Try to Learn What Works

Because some doctors realized long ago that people can get better on their own, they naturally tried to compare how different people with the same disease fared with or without treatment. However, until the middle of the 19th century, it was very difficult to make this comparison. Diseases were so poorly understood that it was difficult to tell when two or more people had the same disease.

Doctors using a given term were often talking about different diseases entirely. For example, in the 18th and 19th centuries, the diagnosis of “dropsy” was given to people whose legs were swollen. We now know that swelling can result from heart failure, kidney failure, or severe liver disease—quite different diseases that do not respond to the same treatments. Similarly, numerous people who had fever and who were also vomiting were diagnosed with “bilious fever.” We now know that many different diseases cause fever and vomiting, such as typhoid, malaria, and hepatitis.

Only when accurate, scientifically based diagnoses became common around the beginning of the 20th century could doctors begin to effectively evaluate treatments. However, they still had to determine how to best evaluate a treatment.

Sample size

First of all, doctors realized they had to look at more than one sick person's response to treatment. One person getting better (or sicker) might be a coincidence. Achieving good results in many people is less likely due to coincidence. The larger the number of people (sample size), the more likely any observed effect is real.

Control groups

Even if doctors find a good response to a new treatment in a large group of people, they still do not know whether the same number of people (or more) would have gotten well on their own or done even better with a different treatment. Thus, doctors typically compare results between a group of people who receive a study treatment (treatment group) and another group (control group) who receive

  • An older treatment

  • Dummy treatment (a placebo, such as a sugar pill)

  • No treatment at all

Studies that involve a control group are called controlled studies.

Time frame

At first, doctors simply gave all their patients with a certain illness a new treatment and then compared their results to a control group of people treated at an earlier time (either by the same or different doctors). The previously treated people are considered a historical control group. For example, if doctors found that 80% of their patients survived malaria after receiving a new treatment, whereas previously only 70% had survived, then they might conclude that this new treatment was more effective.

A problem with making comparisons to results from an earlier time is that advances in general medical care in the time between the old and the new treatments may be responsible for any improvement in outcome. It is not fair to compare the results of people treated in 2015 with those treated in 1985.

To avoid this problem with historical control groups, doctors try to create treatment groups and control groups at the same time and observe the results of the treatment as they unfold. Such studies are called prospective studies.

Comparing apples to apples

The biggest concern with all types of medical studies, including historical studies, is that similar groups of people should be compared.

In the previous example, if the group of people who received the new treatment (treatment group) for malaria was made up of mostly young people who had mild disease, and the previously treated (control) group was made up of older people who had severe disease, it might well be that people in the treatment group fared better simply because they were younger and healthier. Thus, a new treatment could falsely appear to work better.

Many other factors besides age and severity of illness also must be taken into account, such as

  • The overall health of people being studied (people with chronic diseases such as diabetes or kidney failure tend to fare worse than healthier people)

  • The specific doctor and hospital providing care (some may be more skilled and have better facilities than others)

  • The percentages of men and women that comprise the study groups (men and women may respond differently to treatment)

  • The socioeconomic status of the people involved (people with more resources to help support them tend to fare better)

Doctors have tried many different methods to ensure that the groups being compared are as similar as possible, but there are two main approaches:

  • Case-control studies: Precisely pairing people who receive the new treatment (cases) with those who do not (controls) based on as many factors as possible (age, gender, health, etc)

  • Randomized trials: Randomly assigning people to each of the study groups

Case-control studies seem sensible. For example, if a doctor was studying a new treatment for high blood pressure (hypertension), and one person in the treatment group was 42 years old and had diabetes, then the doctor would try to ensure the placement of a 40-some-year-old person with hypertension and diabetes in the control group. However, there are so many differences among people, including differences that the doctor does not even think of, that it is nearly impossible to intentionally create an exact match for each person in a study.

Randomized trials take care of this problem using a completely different approach. Perhaps surprisingly, the best way to ensure a match between groups is not to try at all. Instead, the doctor takes advantage of the laws of probability and randomly assigns (typically with the aid of a computer program) people who have the same disease to different groups. If a large enough group of people is divided randomly, the odds are that people in each group will have similar characteristics.

Prospective, randomized studies are the best way to make sure that a treatment or test is being compared between equivalent groups.

Eliminating other factors

Once doctors have created equivalent groups, they must make sure that the only difference they allow is the study treatment itself. That way, doctors can be sure that any difference in outcome is due to the treatment and not to some other factor, such as the quality or frequency of follow-up care.

The placebo effect is another important factor. People who know they are receiving an actual, new treatment rather than no treatment (or an older, presumably less effective treatment) often expect to feel better. Some people, on the other hand, may expect to experience more side effects from a new, experimental treatment. In either case, these expectations can exaggerate the effects of treatment, causing it to seem more effective or to have more complications than it really does.

Blinding is a technique used to avoid the problems of the placebo effect. People in a study must not know whether they are receiving a new treatment. That is, they are “blinded” to this information. Blinding is usually accomplished by giving people in the control group an identical-appearing substance, usually a placebo—something with no medical effect.

When an effective treatment for a disease already exists, it is not ethical to give the control group a placebo. In those situations, the control group is given an established treatment—something that is already known to be effective in treating the disease. But whether a placebo or an established drug is used, the substance must appear identical to the study drug, so that people cannot tell whether they are taking the study drug. If the treatment group receives a red, bitter liquid, then the control group should also receive a red, bitter liquid. If the treatment group receives a clear solution given by injection, then the control group should receive a similar injection.

Double-blinding takes things one step further. Because the doctor or nurse might accidentally let a person know what treatment they are receiving, and thus "unblind" the person, it is better if all involved health care practitioners remain unaware of what is being given. Double-blinding usually requires that a person separate from the study, such as a pharmacist, prepare identical-appearing substances that are labeled only by a special number code. The number code is broken only after the study is completed.

An additional reason for double-blinding is that the placebo effect can affect even the doctor, who may unconsciously think a person receiving treatment is doing better than a person receiving no treatment, even if both are faring exactly the same. Not all medical studies can be double-blinded. For example, surgeons studying two different surgical procedures obviously know which procedure they are performing (although the people undergoing the procedures can be kept unaware). In such cases, doctors make sure that the people evaluating the outcome of treatment are blinded as to what has been done so they cannot unconsciously bias the results.

Choosing a clinical trial design

The best type of clinical trial is

  • Prospective

  • Randomized

  • Placebo controlled

  • Double blinded

This design allows for the clearest determination of the effectiveness of a treatment. However, in some situations, this trial design may not be possible. For example, with very rare diseases, it is often hard to find enough people for a randomized trial. In those situations, retrospective case-control trials are often conducted.

Drugs Mentioned In This Article

Generic Name Select Brand Names
No US brand name
NOTE: This is the Consumer Version. DOCTORS: Click here for the Professional Version
Click here for the Professional Version


Others also read

Test your knowledge

Ayurveda is the traditional medical system of India. Its practices are distinctly different from those used in Western medicine. Which of the following is NOT typically used in Ayurvedic treatment? 
Download the Manuals App iOS ANDROID
Download the Manuals App iOS ANDROID
Download the Manuals App iOS ANDROID

Also of Interest

Download the Manuals App iOS ANDROID
Download the Manuals App iOS ANDROID
Download the Manuals App iOS ANDROID