
Test results may help make a diagnosis in symptomatic patients (diagnostic testing) or identify occult disease in asymptomatic patients (screening). However, test results may interfere with clinical decision making if the test poorly discriminates between patients with and without disease, if the result is inconsistent with the clinical picture, or if the test result is improperly integrated into the clinical context.
Laboratory tests are imperfect and may mistakenly identify some healthy people as diseased (a falsepositive result) or may mistakenly identify some affected people as diseasefree (a falsenegative result). A test's ability to correctly include or exclude disease depends on how likely a person is to have a disease (prior probability) and on the test's intrinsic operating characteristics.
Although diagnostic testing is often a critical contributor to clinical decision making, testing can have undesired or unintended consequences. Testing must be done with deliberation and purpose and with the expectation that the test result will reduce ambiguity surrounding patient problems and contribute to their health. In addition to the risk of providing incorrect information (thereby delaying initiation of treatment or inducing unnecessary treatment), laboratory tests consume limited resources and may themselves have adverse effects (eg, pneumothorax caused by lung biopsy) or may prompt additional unnecessary testing.
Defining a Positive Test Result
Among the most common tests are those that provide results along a continuous, quantitative scale (eg, blood glucose, WBC count). Such tests may provide useful clinical information throughout their ranges, but clinicians often use them to diagnose a condition by requiring that the result be classified as positive or negative (ie, disease present or absent) based on comparison to some established criterion or cutoff point. Such cutoff points are usually selected based on statistical and conceptual analysis that attempts to balance the rate of falsepositive results (prompting unnecessary, expensive, and possibly dangerous tests or treatments) and falsenegative results (failing to diagnose a treatable disease). Identifying a cutoff point also depends on having a gold standard to identify the disease in question.
Typically, such quantitative test results (eg, WBC count in cases of suspected appendicitis) follow some type of distribution curve (not necessarily a normal curve, although commonly depicted as such). The distribution of test results for patients with disease is centered on a different point than that for patients without disease. Some patients with disease will have a very high or very low result, but most have a result centered on a mean. Conversely, some diseasefree patients have a very high or very low result, but most have a result centered on a different mean from that for patients with disease. For most tests, the distributions overlap such that many of the possible test results occur in patients with and without disease; such results are more clearly illustrated when the curves are depicted on the same graph (see Fig. 2: Distributions of test results.). Some patients above and below the selected cutoff point will be incorrectly characterized. Adjusting a cutoff point to identify more patients with disease (increase test sensitivity) also increases the number of false positives (poor specificity), and moving the cutoff point the other way to avoid falsely diagnosing patients as having disease increases the number of false negatives. Each cutoff point is associated with a specific probability of truepositive and falsepositive results.
Receiver operating characteristic (ROC) curves:
Graphing the fraction of truepositive results (number of true positives/number with disease) against the fraction of falsepositive results (number of false positives/number without disease) for a series of cutoff points generates what is known as an ROC curve. The ROC curve graphically depicts the tradeoff between the sensitivity and specificity when the cut off point is adjusted (see Fig. 3: Typical receiver operating characteristic (ROC) curve.). By convention, the truepositive fraction is placed on the yaxis, and the falsepositive fraction is placed on the xaxis. The greater the area under the ROC curve, the better the test discriminates between patients with or without disease.
ROC curves allow tests to be compared over a variety of cutoff points. In the example, Test A performs better than Test B over all ranges. ROC curves also assist in the selection of the cutoff point designed to maximize a test's utility. If a test is designed to confirm a disease, a cutoff point with greater specificity and lower sensitivity is selected. If a test is designed to screen for occult disease, a cutoff point with greater sensitivity and lower specificity is selected.
Test Characteristics
Some clinical variables have only 2 possible results (eg, alive/dead, pregnant/not pregnant); such variables are termed categorical and dichotomous. Other categorical results may have many discrete values (eg, blood type, Glasgow Coma Scale) and are termed nominal or ordinal. Nominal variables such as blood type have no particular order. Ordinal variables such as the Glasgow Coma Scale have discrete values that are arranged in a particular order. Other clinical variables, including many typical diagnostic tests, are continuous and have an infinite number of possible results (eg, WBC count, blood glucose level). Many clinicians select a cutoff point that can cause a continuous variable to be treated as a dichotomous variable (eg, patients with a fasting blood glucose level > 126 mg/dL are considered to have diabetes). Other continuous diagnostic tests have diagnostic utility when they have multiple cutoff points or when ranges of results have different diagnostic value.
When test results can be defined as positive or negative, all possible outcomes can be recorded in a simple 2×2 table (see Table 2: Distribution of Hypothetical Test Results) from which important discriminatory test characteristics, including sensitivity, specificity, positive and negative predictive value, and likelihood ratio (LR), can be calculated (see Table 3: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI).
Table 2

PrintOpen table in new window 
  
Distribution of Hypothetical Test Results 
Results

Disease Present

Disease Absent

Test positive

True positive

False positive

Test negative

False negative

True negative

Total patients

All patients with disease

All patients without disease


Sensitivity, specificity, and predictive values:
Sensitivity, specificity, and predictive values are typically considered characteristics of the test itself, independent of the patient population.
Sensitivity is the likelihood of a positive test result in patients with disease (truepositive rate); a test that is positive in 8 of 10 patients with a disease has a sensitivity of 0.8 (also expressed as 80%). Sensitivity represents how well a test detects the disease; a test with low sensitivity does not identify many patients with disease, and a test with high sensitivity is useful to exclude a diagnosis when results are negative. Sensitivity is the complement of the falsenegative rate (ie, the falsenegative rate plus the sensitivity = 100%).
Specificity is the likelihood of a negative test result in patients without disease (truenegative rate); a test that is negative in 9 of 10 patients without disease has a specificity of 0.9 (or 90%). Specificity represents how well a test correctly identifies patients with disease because tests with high specificity have a low falsepositive rate. A test with low specificity diagnoses many patients without disease as having disease. It is the complement of the falsepositive rate.
Positive predictive value (PPV) is the proportion of patients with a positive test that actually have disease; if 9 of 10 positive test results are correct (true positive), the PPV is 90%. Because all positive test results have some number of true positives and some false positives, the PPV describes how likely it is that a positive test result in a given patient population represents a true positive.
Negative predictive value (NPV) is the proportion of patients with a negative test result that are actually disease free; if 8 of 10 negative test results are correct (true negative), the NPV is 80%. Because not all negative test results are true negatives, some patients with a negative test result actually have disease. The NPV describes how likely it is that a negative test result in a given patient population represents a true negative.
Likelihood ratios (LRs):
Unlike sensitivity and specificity, which do not apply to specific patient probabilities, the LR allows clinicians to interpret test results in a specific patient provided there is a known (albeit often estimated) pretest probability of disease.
The LR describes the change in pretest probability of disease when the test result is known and answers the question, “How much has the posttest probability changed now that the test result is known?” Many clinical tests are dichotomous; they are either above the cutoff point (positive) or below the cutoff point (negative) and there are only 2 possible results. Other tests give results that are continuous or occur over a range where multiple cutoff points are selected. The actual posttest probability depends on the magnitude of the LR (which depends on test operating characteristics) and the pretest probability estimation of disease. When the test being done is dichotomous and the result is either positive or negative, the sensitivity and specificity can be used to calculate positive LR (LR+) or negative LR (LR).
When the result is continuous or has multiple cutoff points, the ROC curve, not sensitivity and specificity, is used to calculate an LR that is no longer described as LR+ or LR.
Because the LR is a ratio of mutually exclusive events rather than a proportion of a total, it represents odds (see Sidebar 1: Probability and Odds) rather than probability. For a given test, the LR is different for positive and negative results.
For example, given a positive test result, an LR of 2.0 indicates the odds are 2:1 (true positives:false positives) that a positive test result represents a patient with disease. Of 3 positive tests, 2 would occur in patients with disease (true positive) and 1 would occur in a patient without disease (false positive). Because true positives and false positives are components of sensitivity and specificity calculations, the LR+ can also be calculated as sensitivity/(1 − specificity). The greater the LR+, the more information a positive test result provides; a positive result on a test with an LR+ > 10 is considered strong evidence in favor of a diagnosis. In other words, the pretest probability estimation moves strongly toward 100% when a positive test has a high LR+.
For a negative test result, an LR of 0.25 indicates that the odds are 1:4 (false negatives:true negatives) that a negative test result represents a patient with disease. Of 5 negative test results, 1 would occur in a patient with disease (false negative) and 4 would occur in patients without disease (true negative). The LR can also be calculated as (1 − sensitivity)/specificity. The smaller the LR, the more information a negative test result provides; a negative result on a test with an LR < 0.1 is considered strong evidence against a diagnosis. In other words, the pretest probability estimation moves strongly toward 0% probability when a negative test has a low LR.
Test results with LRs of 1.0 carry no information and do not affect the posttest probability of disease.
LRs are convenient for comparing tests and are also used in Bayesian analysis (see Bayes' Theorem) to interpret test results. Just as sensitivity and specificity change as cutoff points change, so do LRs. As a hypothetical example, a high cutoff for WBC count (eg, 20,000/μL) in a possible case of acute appendicitis is more specific and would have a high LR+ but also a high (and thus not very informative) LR; choosing a much lower and very sensitive cutoff (eg, 10,000/μL) would have a low LR but also a low LR+.
Dichotomous Tests
An ideal dichotomous test would have no false positives or false negatives; all patients with a positive test result would have disease (100% PPV), and all patients with a negative test result would not have disease (100% NPV).
In reality, all tests have false positives and false negatives, some tests more than others. To illustrate the consequences of imperfect sensitivity and specificity on test results, consider hypothetical results (Table 3: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI) of urine dipstick leukocyte esterase testing in a group of 1000 women, 300 (30%) of whom have a UTI (as determined by a goldstandard test such as urine culture). This scenario assumes for illustrative purposes that the dipstick test has sensitivity of 71% and specificity of 85%.
Sensitivity of 71% means that only 213 (71% of 300) women with UTI would have a positive test result. The remaining 87 would have a negative test result. Specificity of 85% means that 595 (85% of 700) women without UTI would have a negative test result. The remaining 105 would have a positive test result. Thus, of 318 positive test results, only 213 would be correct (213/318 = 67% PPV); a positive test result makes the diagnosis of UTI more likely than not but not certain. There would also be 682 negative tests, of which 595 are correct (595/682 = 87% NPV), making the diagnosis of UTI much less likely but still possible; 13% of patients with a negative test result would actually have a UTI.
Table 3

PrintOpen table in new window 
  
Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI 
Results

Disease Present

Disease Absent

Total Patients

Test positive

True positive (TP)
213 patients (71% of 300)

False positive (FP)
105 patients (700 − 595)

318 patients with a positive test

Test negative

False negative (FN)
87 patients (300 − 213)

True negative (TN)
595 patients (85% of 700)

682 patients with a negative test

Total patients

300 patients with UTI (assumed)

700 patients without UTI (assumed)

1000 patients

Positive predictive value (PPV) = TP/(all patients with a positive test) = TP/(TP + FP) = 213/(213 + 105) = 67%.
Negative predictive value (NPV) = TN/(all patients with a negative test) = TN/(TN + FN) = 595/(595 + 87) = 87%.
Positive likelihood ratio (LR+) = sensitivity/(1 − specificity) = 0.71/(1 − 0.85) = 4.73.
Negative likelihood ratio (LR) = (1 − sensitivity)/specificity = (1 − 0.71)/0.85 = 0.34.


However, the PPVs and NPVs derived in this patient cohort cannot be used to interpret results of the same test when the underlying incidence of disease (pretest or prior probability) is different. Note the effects of changing disease incidence to 5% (see Table 4: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 5% Prevalence of UTI). Now most positive test results are false, and the PPV is only 20%; a patient with a positive test result is actually more likely to not have a UTI. However, the NPV is now very high (98%); a negative result essentially rules out UTI.
Table 4

PrintOpen table in new window 
  
Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 5% Prevalence of UTI 
Results

Disease Present

Disease Absent

Total Patients

Test positive

True positive (TP)
36 patients (71% of 50)

False positive (FP)
144 patients (950 − 806)

180 patients with a positive test

Test negative

False negative (FN)
14 patients (50 − 36)

True negative (TN)
806 patients (85% of 950)

820 patients with a negative test

Total patients

50 patients with UTI (assumed)

950 patients without UTI (assumed)

1000 patients

Positive predictive value (PPV) = TP/(all with a positive test) = TP/(TP + FP) = 36/(36 + 144) = 20%.
Negative predictive value (NPV) = TN/(all with a negative test) = TN/(TN + FN) = 806/(806 + 14) = 98%.
Positive likelihood ratio (LR+) = sensitivity/(1 − specificity) = 0.71/(1 − 0.85) = 4.73.
Negative likelihood ratio (LR) = (1 − sensitivity)/specificity = (1 − 0.71)/0.85 = 0.34.


Note that in both patient cohorts, even though the PPV and NPV are very different, the LRs do not change because the LRs are determined only by test sensitivity and specificity.
Clearly, a test result does not provide a definitive diagnosis but only estimates the probability of a disease being present or absent, and this posttest probability (likelihood of disease given a specific test result) varies greatly based on the pretest probability of disease as well as the test's sensitivity and specificity (and thus its LR).
Pretest probability is not a precise measurement; it is based on clinical judgment of how strongly the symptoms and signs suggest the disease is present, what factors in the patient's history support the diagnosis, and how common the disease is in a representative population. Many clinical scoring systems are designed to estimate pretest probability; adding points for various clinical features facilitates the calculation of a score. For example, there are criteria for predicting pretest probability of pulmonary embolism (see Clinical probability). Higher calculated scores yield higher estimated probabilities.
Continuous Tests
Many test results are continuous and may provide useful clinical information over a wide range of results. Clinicians often select a certain cutoff point to maximize the test's utility. For example, a WBC count > 15,000 may be characterized as positive; values < 15,000 as negative. When a test yields continuous results but a certain cutoff point is selected, the test operates like a dichotomous test. Multiple cutoff points can also be selected. Sensitivity, specificity, PPV, NPV, LR+, and LR can be calculated for single or multiple cutoff points. Table 5: Effect of Changing the Cutoff Point of the WBC Count in Patients Suspected of Having Appendicitis illustrates the effect of changing the cutoff point of the WBC count in patients suspected of having appendicitis.
Table 5

PrintOpen table in new window 
  
Effect of Changing the Cutoff Point of the WBC Count in Patients Suspected of Having Appendicitis 
WBC Cutoff*

Sensitivity

Specificity

LR+

LR

> 10,500

84%

53.13%

1.79

0.3

> 11,500

78%

62.5%

2.13

0.32

> 12,850

68%

75%

2.72

0.43

> 13,400

61.33%

78.12%

2.86

0.45

> 14,300

56.67%

81.25%

3.2

0.49

*Various cutoff points are selected for a continuous variable such as WBC count; results above the cutoff point are considered positive and those below the cutoff point are considered negative.

LR = likelihood ratio.

Adapted from Keskek M, Tez M, Yoldas O, et al: Receiver operating characteristic analysis of leukocyte counts in operations for suspected appendicitis. American Journal of Emergency Medicine 26:769–772, 2008.


Alternatively, it can be useful to group continuous test results into levels. In this case, results are not characterized as positive or negative because there are multiple possible results, so although an LR can be determined for each level of results, there is no longer a distinct LR+ or LR. For example, Table 6: Using WBC Count Groups to Determine Likelihood Ratio of Bacteremia in Febrile Children* illustrates the relationship between WBC count and bacteremia in febrile children. Because the LR is the probability of a given result in patients with disease divided by the probability of that result in patients without the disease, the LR for each grouping of WBC count is the probability of bacteremia in that group divided by the probability of no bacteremia.
Table 6

PrintOpen table in new window 
  
Using WBC Count Groups to Determine Likelihood Ratio of Bacteremia in Febrile Children* 
WBC Count

Number of Children With Bacteremia, n = 127 (%)

Number of Children Without Bacteremia, N = 8629 (%)

LR (% With Bacteremia/% Without Bacteremia)

0–5000

0 (0.0%)

543 (6.3%)

0.00

5,001–10,000

3 (2.4%)

3291 (38.1%)

0.06

10,001–15,000

15 (11.8%)

2767 (32.1%)

0.37

15,001–20,000

48 (37.8%)

1337 (15.5%)

2.4

20,001–25,000

34 (26.8%)

469 (5.4%)

4.9

25,001–30,000

12 (9.4%)

155 (1.8%)

5.3

> 30,001

15 (11.8%)

67 (0.8%)

15.2

*Incidence of bacteremia in 8756 febrile children grouped by WBC count. LR for each group is calculated by dividing the probability of bacteremia by the probability of no bacteremia.

LR = likelihood ratio.

Adapted from Lee GM, Harper MB: Risk of bacteremia for febrile young children in the postHaemophilus influenzae type b era. Archives of Pediatric and Adolescent Medicine 152:624–628, 1998.


Grouping continuous variables allows for much greater use of the test result than when a single cutoff point is established. Using Bayesian analyses, the LRs in Table 6: Using WBC Count Groups to Determine Likelihood Ratio of Bacteremia in Febrile Children* can be used to calculate the posttest probability.
For continuous test results, if an ROC curve is known, calculations as shown in Table 6: Using WBC Count Groups to Determine Likelihood Ratio of Bacteremia in Febrile Children* do not have to be done; LRs can be found for various points over the range of results using the slope of the ROC curve at the desired point.
Bayes' Theorem
The process of using the pretest probability of disease and the test characteristics to calculate the posttest probability is referred to as Bayes' theorem or Bayesian revision. For routine clinical use, Bayesian methodology typically takes several forms:
Oddslikelihood calculation:
If the pretest probability of disease is expressed as its odds and because a test's LR represents odds, the product of the 2 represents the posttest odds of disease (analogous to multiplying 2 probabilities together to calculate the probability of simultaneous occurrence of 2 events):
Pretest odds × LR = posttest odds
Because clinicians typically think in terms of probabilities rather than odds, probability can be converted to odds (and vice versa) with these formulas:
Odds = probability/1 − probability
Probability = odds/odds + 1
Consider the example of UTI as given in Table 3: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI, in which the pretest probability of UTI is 0.3, and the test being used has an LR+ of 4.73 and an LR of 0.34. A pretest probability of 0.3 corresponds to odds of 0.3/(1 − 0.3) = 0.43. Thus, the posttest odds that a UTI is present in a patient with a positive test result equals the product of the pretest odds and the LR+; 4.73 × 0.43 = 2.03, which represents a posttest probability of 2.03/(1 + 2.03) = 0.67. Thus, Bayesian calculations show that a positive test result increases the pretest probability from 30% to 67%, the same result obtained in the PPV calculation in Table 3: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI.
A similar calculation is done for a negative test; posttest odds = 0.34 × 0.43 = 0.15, corresponding to a probability of 0.15/(1 + 0.15) = 0.13. Thus, a negative test result decreases the pretest probability from 30% to 13%, again the same result obtained in the NPV calculation in Table 3: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI.
Many medical calculator programs that run on handheld devices are available to calculate posttest probability from pretest probability and LRs.
Clinical Calculator


Oddslikelihood nomogram:
Using a nomogram is particularly convenient because it avoids the need to convert between odds and probabilities or create 2×2 tables.
The Fagan nomogram is depicted in Fig. 4: Fagan nomogram.. To use the nomogram, a line is drawn from the pretest probability through the LR. The posttest probability is the point at which this line intersects the posttest probability line. Sample lines in the figure are drawn using data from the UTI test in Table 3: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI. Line A represents a positive test result; it is drawn from pretest probability of 0.3 through the LR+ of 4.73 and gives a posttest value of slightly < 0.7, similar to the calculated probability of 0.67. Line B represents a negative test result; it is drawn from pretest probability of 0.3 through the LR value of 0.34 and gives a posttest value slightly > 0.1, similar to the calculated probability of 13%.
Although the nomogram appears less precise than calculations, typical values for pretest probability are often estimates, so the apparent precision of calculations is usually misleading.
Tabular approach:
Often, LRs of a test are not known, but sensitivity and specificity are known, and pretest probability can be estimated. In this case, Bayesian methodology can be done using a 2×2 table illustrated in Table 7: Interpretation of a Hypothetical Leukocyte Esterase (LE) Test Result in a Cohort of 1000 Women Assuming a 30% Prevalence of UTI (PreTest Probability), Test Sensitivity 71%, and Specificity 85%* using the example from Table 3: Distribution of Test Results of a Hypothetical Leukocyte Esterase Test in a Cohort of 1000 Women With an Assumed 30% Prevalence of UTI. Note that this method shows that a positive test result increases the probability of a UTI to 67%, and a negative result decreases it to 13%, the same results obtained by calculation using LRs.
Table 7

PrintOpen table in new window 
  
Interpretation of a Hypothetical Leukocyte Esterase (LE) Test Result in a Cohort of 1000 Women Assuming a 30% Prevalence of UTI (PreTest Probability), Test Sensitivity 71%, and Specificity 85%* 
Results

UTI Present

UTI Absent

300 patients with UTI

700 patients without UTI

LE test positive

213 patients (TP)

105 patients (FP)

LE test negative

87 patients (FN)

595 patients (TN)

*Bayes' theorem can be simplified to allow the calculation of posttest probability when pretest probability is known:
 Posttest probability when the test is positive = TP/(all with a positive test) = TP/(TP + FP) = 213/(213 + 105) = 67%.
 Posttest probability when the test is negative = TN/(all with a negative test) = FN/(FN + TN) = 87/(87 + 595) = 13%.

FN = false negative; FP = false positive; TN = true negative; TP = true positive.


Sequential Testing
Clinicians often do tests in sequence during many diagnostic evaluations. If the pretest odds before sequential testing are known and the LR for each of the tests in sequence is known, posttest odds can be calculated using the following formula:
Pretest odds × LR1 × LR2 × LR3 = posttest odds
This method is limited by the important assumption that each of the tests is conditionally independent of each other.
Screening Tests
Patients often must consider whether to be screened for occult disease. The premises of a screening program are that early detection improves outcome in patients with occult disease and that the falsepositive results that often occur in screening do not create a burden (eg, costs and adverse effects of confirmatory testing, unwarranted treatment) that exceeds such benefit. To minimize these possible burdens, clinicians must choose the proper screening test . Screening is not appropriate when treatments are ineffective or the disease is very uncommon (unless a subpopulation can be identified in which prevalence is higher).
Theoretically, the best test for both screening and diagnosis is the one with the highest sensitivity and specificity. However, such highly accurate tests are often complex, expensive, and invasive (eg, coronary angiography) and are thus not practical for screening large numbers of asymptomatic people. Typically, some tradeoff in sensitivity, specificity, or both must be made when selecting a screening test.
Whether a clinician chooses a test that optimizes sensitivity or specificity depends on the consequences of a falsepositive or falsenegative test result as well as the pretest probability of disease. An ideal screening test is one that is always positive in nearly every patient with disease so that a negative result confidently excludes disease in healthy patients. For example, in testing for a serious disease for which an effective treatment is available (eg, coronary artery disease), clinicians would be willing to tolerate more false positives than false negatives (lower specificity and high sensitivity). Although high sensitivity is a very important attribute for screening tests, specificity also is important in certain screening strategies. Among populations with a higher prevalence of disease, the PPV of a screening test increases; as prevalence decreases, the posttest or posterior probability of a positive result decreases. Therefore, when screening for disease in highrisk populations, tests with a higher sensitivity are preferred over those with a higher specificity because they are better at ruling out disease (fewer false negatives). On the other hand, in lowrisk populations or for uncommon diseases for which therapy has lower benefit or higher risk, tests with a higher specificity are preferred.
Multiple screening tests:
With the expanding array of available screening tests, clinicians must consider the implications of a panel of such tests. For example, test panels containing 8, 12, or sometimes 20 blood tests are often done when a patient is admitted to the hospital or is first examined by a new clinician. Although this type of testing may be helpful in screening patients for certain diseases, using the large panel of tests has potentially negative consequences. By definition, a test with a specificity of 95% gives falsepositive results in 5% of healthy, normal patients. If 2 different tests with such characteristics are done, each for a different occult disease, in a patient who actually does not have either disease, the chance that both tests will be negative is 95% × 95%, or about 90%; thus, there is a 10% chance of at least one falsepositive result. For 3 such tests, the chance that all 3 would be negative is 95% × 95% × 95%, or 86%, corresponding to a 14% chance of at least one falsepositive result. If 12 different tests for 12 different diseases are done, the chance of obtaining at least one falsepositive result is 46%. This high probability underscores the need for caution when deciding to do a screening test panel and when interpreting its results.
Testing Thresholds
A laboratory test should be done only if its results will affect management; otherwise the expense and risk to the patient are for naught. Clinicians can sometimes make the determination of when to test by comparing pretest and posttest probability estimations with certain thresholds. Above a certain probability threshold, benefits of treatment outweigh risks (including the risk of mistakenly treating a patient without disease), and treatment is indicated. This point is termed the treatment threshold (TT) and is determined as described previously (see Probability Estimations and the Treatment Threshold). By definition, testing is unnecessary when pretest probability is already above the TT. But testing is indicated if pretest probability is below the TT as long as a positive test result could raise the posttest probability above the TT. The lowest pretest probability at which this can occur depends on test characteristics (eg, LR+) and is termed the testing threshold.
Conceptually, if the best test for a serious disorder has a low LR+, and the TT is high, it is understandable that a positive test result might not move the posttest probability above the TT in a patient with a low but worrisome pretest probability (eg, perhaps 10% or 20%).
For a numerical illustration, consider the previously described case of a possible acute MI (see Probability Estimations and the Treatment Threshold) in which the balance between risk and benefit determined a TT of 25%. When the probability of MI exceeds 25%, thrombolytic therapy is given. When should a rapid echocardiogram be done before giving thrombolytic therapy? Assume a hypothetical sensitivity of 60% and a specificity of 70% for echocardiography in diagnosing an MI; these percentages correspond to an LR+ of 60/(100 − 70) = 2 and an LR of (100 − 60)/70 = 0.57.
The issue can be addressed mathematically (pretest odds × LR = posttest odds) or more intuitively graphically by using the Fagan nomogram (see Fig. 5: Fagan nomogram used to determine need to test.). On the nomogram, a line connecting the TT (25%) on the posttest probability line through the LR+ (2.0) on the middle LR line intersects a pretest probability of about 0.14. Clearly, a positive test in a patient with any pretest probability < 14% would still result in a posttest probability less than the TT. In this case, echocardiography would be useless because even a positive result would not lead to a decision to treat; thus, 14% pretest probability is the testing threshold for this particular test (see Fig. 6: Depiction of testing and treatment thresholds.). Another test with a different LR+ would have a different testing threshold.
Because 14% still represents a significant risk of MI, it is clear that a disease probability below the testing threshold (eg, a 10% pretest probability) does not necessarily mean disease is ruled out, just that a positive test result on the particular test in question would not change management and thus that test is not indicated. In this situation, the clinician would observe the patient for further findings that might elevate the pretest probability above the testing threshold. In practice, because multiple tests are often available for a given disease, sequential testing (see Sequential Testing) might be used.
This example considers a test that of itself poses no risk to the patient. If a test has serious risks (eg, cardiac catheterization), the testing threshold should be higher; quantitative calculations can be done but are complex. Thus, decreasing a test's sensitivity and specificity or increasing its risk narrows the range of probabilities of disease over which testing is the best strategy. Improving the test's ability to discriminate or decreasing its risk broadens the range of probabilities over which testing is the best strategy.
A possible exception to the proscription against testing when pretest probability is below the testing threshold (but is still worrisome) might be if a negative test result could reduce posttest probability below the point at which disease could be considered ruled out. This determination requires a subjective judgment of the degree of certainty required to say a disease is ruled out and, because low probabilities are involved, particular attention to any risks of testing.
Last full review/revision March 2010 by Douglas L. McGee, DO
Content last modified October 2013
 
