Decision curve analysis

Decision curve analysis evaluates a predictor for an event, where values of the predictor beyond a threshold will indicate that some intervention or treatment should be performed. As an example, a doctor suspecting prostate cancer in a patient may consider performing a biopsy (the intervention/treatment) if a blood sample is beyond some level, or if a prediction model gives a risk above some value (the threshold). The doctor wants to perform biopsies in the right subset of patients, i.e. those where it will reveal cancer, while avoiding performing a painful biopsy in patients that are not sick.

The purpose of decision curve analysis is to evaluate whether use of the predictor on average will bring more benefit than harm. The threshold probability is used as a measure of the harm in performing unnecessary treatment (false positives) compared to the benefit of treatment in those that need it (true positives). The net benefit is calculated as the weighed difference between the rates of true and false positives. This represents the benefit of treating the right patients (the true positives) minus the harm of treating those that did not need treatment (the false positives). The net benefit is benefit relative to doing nothing: A strategy of treating none has a net benefit of zero.

The decision curve is typically shown as a graphical plot of the net benefit from using the predictor, plotted as a function of the threshold probability. Two default strategies are also plotted: Treat all and treat none. "Treat all" is a relevant strategy for cases where the harm of the treatment procedure is considered low, e.g. a screening test. "Treat none" is a relevant strategy if no specific knowledge is available. The predictor will only be of value in a range where its net benefit is above both the “treat all” curve and the “treat none” curve.

The threshold probability is the minimum probability of an event at which a decision-maker would take a given action, for instance, the probability of cancer at which a doctor would order a biopsy. If the threshold is low, action will be taken on a ‘smaller’ indication than if the threshold is high. A lower threshold probability implies a greater concern about the event (e.g. a patient worried about cancer), while a higher threshold implies greater concern about the action to be taken (e.g. a patient averse to the biopsy procedure).

Decision curve analysis does not tell what threshold probability should be used. Instead, the researcher evaluating the predictor should indicate a range of relevant threshold probabilities, and focus on the plot within this range.^[1]^[2]

The predictor evaluated by decision curve analysis could be a binary classifier (yes/no), or a percentage risk from a prediction model. In the latter case, treatment is indicated if the percentage risk is higher than the threshold probability.

General theory

The use of threshold probability to weight true and false positives derives from decision theory, in which the expected value of a decision can be calculated from the utilities and probabilities associated with decision outcomes. In the case of predicting an event, there are four possible outcomes: true positive, true negative, false positive and false negative. This means that to conduct a decision analysis, the analyst must specify four different utilities, which is often challenging.

In decision curve analysis, the strategy of treating no patients (considering all observations as negative) is defined as having a value of zero. This means that only true positives (event identified and appropriately managed) and false positives (unnecessary action) are considered.^[3]

The threshold probability is used to measure of the harm in unnecessary treatment (false positives) weighed against the benefit in relevant treatment (true positives). It is easily shown that the ratio of the utility of a true positive vs. the utility of avoiding a false positive is the odds at the threshold probability.^[4] For instance, a doctor whose threshold probability to order a biopsy for cancer is 10% believes that the utility of finding cancer early is 9 times greater (odds 9:1 = 90:10) than that of avoiding the harm of unnecessary biopsy. Similarly to the calculation of expected value, weighting false positive outcomes by the threshold probability yields an estimate of net benefit that incorporates decision consequences and preferences.^[2]

Net benefit

Theory

Net benefit is calculated as a weighted combination of true and false positives, where $p_{t}$ is the threshold probability for treatment (intervention), true and false positives are count variables, and $N$ is the total number of observations:^[4]

Net\ Benefit={{True\ positives} \over N}-{{False\ positives} \over N}\times {p_{t} \over 1-p_{t}}

The theoretical maximal value of the net benefit equals the prevalence of the disease in the population considered.^[1]

{Prevalence}={{Total\ number\ of\ sick} \over N}={{Maximal\ number\ of\ true\ positives} \over N}

If the threshold probability is set to zero, $p_{t}=0$ , then a "treat all" approach will achieve this maximal value.

There is no lower bound to the value of the net benefit, it can be infinitely negative.^[4] The closer $p_{t}$ comes to the upper limit of 1, the larger becomes the factor on the false positive rate.

Interpretation

The value of net benefit is true positives. For instance, a net benefit of 0.07 is the same as finding 7 true positives per 100 patients in the target population.^[1] A negative net benefit means that performing the treatment will on average do more harm than good.

Finding sick patients (true positives) is beneficial, while treating healthy patients (false positives) involves some degree of harm. Harm is here to be understood in a broad context, which can include patient discomfort, the risk of negative side effects, etc. The calculation of net benefit quantifies this harm by converting false negatives into a number that should be subtracted from the true positives. A negative net benefit means that performing the treatment in the selected patients will on average do more harm than good.

The threshold probability, $p_{t}$ , is the probability that should be exceeded before action is taken. In the literature on decision curve analysis, the action is often called treatment, which can be any kind of intervention, including performing a test that involves risk and/or patient discomfort, e.g. biopsy.^[1]

The net benefit can be compared to net profit in a trade: ${net\ profit}={income-expenditures}$ .^[1] In this comparison,

the true positive rate corresponds to the income of the sale (e.g. in dollars),
the false positive rate corresponds to what must be put into the sale (e.g. in units of goods, or in units of euros),
and the factor corresponds to the exchange rate (cost in dollars per unit of goods, or dollars per euro).

Thus, the factor ${p_{t} \over 1-p_{t}}$ is the exchange rate from false positives (harm) to true positives (benefit). Put another way, the factor is the ‘price’ of one false positive. Mathematically, the factor represents the odds corresponding to probability $p_{t}$ .

As an example, the threshold probability $p_{t}=0.1=10\%$ corresponds to odds 10:90 = 1:9. In this case, the price of a false positive is 1/9 value of one true positive. Or expressed the other way around: Finding 1 true positive is worth the cost of 9 false positives. In the biopsy example, $p_{t}=10\%$ corresponds to saying that finding 1 positive biopsy is worth the harm in performing biopsy in 9 persons that turn out to be healthy.^[1]

Setting $p_{t}=0$ gives false positives a 'price' of zero, i.e. 1 true positive is in that case worth any number of false positives (no threshold). Conversely, setting $p_{t}=1=100\%$ gives an infinitely high price on a false positive, so even if there is a 99.999...% chance that the patient is (truly) positive, treatment should not be performed.

Decision curve interpretation

A decision curve analysis graph is drawn by plotting threshold probability on the x-axis and net benefit on y-axis, illustrating the trade-offs between benefit (true positives) and harm (false positives). The amount of net benefit varies as the threshold probability (preference or not for treatment) is varied.^[4] (Below is given further details on how to draw the plot.)

For easier interpretation, the x-axis may be labelled “preference” instead of “threshold probability”, with preference for intervention to the left (low threshold for treatment) and preference for non-intervention to the right (high threshold for treatment). Similarly, the y-axis may be labelled “benefit” instead of “net benefit”.^[1]

The figure gives a hypothetical example of biopsy for cancer in some patient group.

The horizontal line at 0 represents the (lack of) net benefit when none are treated. The slope represents the strategy of treating all, which depends on the threshold probability. And the two grey curves show the decision curves for two different prediction models A and B, i.e., two models giving the doctor information about whether to biopsy a given patient or not.

Treat none: This is a reasonable default strategy if no specific information is available, and treatment involves some degree of harm. For example, a doctor will likely not biopsy a patient just because the patient mentions a worry of prostate cancer (but may instead take a blood sample to get more specific information). The “treat none” strategy has a net benefit of zero.
Treat all: This is a reasonable default strategy if the treatment may give benefit and involves negligible harm. For a given group (men of a certain age, patients coming to the doctor worried about health, etc.), it can be reasonable to take a blood test in all patients. Similarly for screening tests. A “treat all” strategy can also give positive net benefit when there is some harm, as long as the (average) benefit outweighs the harm. However, if the ‘cost’ of false positives increases (higher threshold probability in the plot, leading to a higher exchange rate in the net benefit formula), the net benefit of treating all patients decreases. The curve for “Treat all” crosses the y-axis and “Treat none” at the event prevalence.^[4]
Model A: In the example plot, the decision curve for model A is everywhere above or equal to both the “Treat all” and the “Treat none” curves. Thus, Model A performs at least as good as both of these default models. However, for threshold probabilities below 5%, the decision curve for Model A is virtually indistinguishable from the “Treat all” curve. A low threshold indicates that the doctor is very likely to perform the treatment, e.g. performing a biopsy within this patient group. It may be the case that using Model A with a threshold probability of 5% will spare a some patients an unneeded biopsy (less harm), but also a few patients with cancer will be overlooked (less benefit). Within the group of patients with cancer probability below 5%, Model A brings neither more nor less net benefit than treating all these patients.
Model B: This model performs better than the “treat none” strategy, but for probability thresholds less than about 10%, it gives less net benefit than the “treat all” strategy. This corresponds to Model B underestimating the probability of cancer, with the consequence that too few patients are biopsied. Expressed differently, Model B is not well calibrated to the actual probability.^[5]

Net benefit on the y-axis is expressed in units of true positives per person.^[1] For instance, a difference in net benefit of 0.025 at a given threshold probability between two predictors of cancer, Model A and Model B, could be interpreted as “using Model A instead of Model B to order biopsies increases the number of cancers detected by 25 per 1000 patients, without changing the number of unnecessary biopsies.”

In the example, the prevalence of the condition in the considered population is 0.15 = 15%. This can be read off the plot where the “Treat all” curve crosses one of the axis: on the y-axis for a zero threshold (net benefit = 0.15), or on the x-axis where the net benefit of “Treat all” becomes zero (threshold probability = 15%).

Threshold probability range and clinical relevance

The decision curve cannot be used to choose the best threshold. Instead, a relevant range of threshold probabilities should be defined from the clinical situation, and the decision curve analysis should focus on this range.^[1]

Related to the clinic, a doctor may use a prediction model (e.g. Model A from the plotted example) whose output is a probability of the patient having prostate cancer. If the probability is above the threshold, a biopsy is performed. The “best” threshold will depend on the clinical situation, involving both the doctor and the patient. If the patient is very worried about cancer, the threshold for performing a biopsy can be set low (lower value of $p_{t}$ ). If, on the other hand, the patient is averse to biopsy and not very worried about the risk of cancer, the threshold can be set at at higher level. Based on experience and personal judgement, individual doctors may also work with different ranges.

Despite these differences, it may be possible to reach a consensus on the outer limits that are relevant. As an example, suppose that all doctors can agree that if the risk of cancer is below 2%, then a biopsy is too invasive, and all doctors can also agree that if the risk of cancer is above 30%, a biopsy should surely be performed. The individual doctors will very likely work within narrower ranges, but nobody are in doubt outside the 2% to 30% range (in the example). Accordingly, the information from the decision curve will not be relevant outside this range. Expressed differently:

A threshold probability of 2% corresponds to willingness perform 100 biopsies to find 2 cancers, while in the other 98 patients the biopsy result will indicate that the patient is healthy. This corresponds to odds 2:98 = 1:49 against finding a cancer for a case at with just 2% probability. In the example, the doctors agree that biopsy in 49 healthy patients is too high a price for finding 1 cancer.
A probability of 30% corresponds to performing 100 biopsies to find 30 cancers. The odds are 30:70 = 3:7, and the doctors agree that 7 biopsies in healthy patients is a price that is surely worth the benefit of finding 3 cancers (or equally: finding 1 cancer is surely worth the harm in performing biopsy in 7/3 = 2.33 healthy patients).

In a given situation, the doctor should choose a threshold that balances benefit and harm. Quoting Vickers et al.:^[1]

For instance, a doctor might say “Thinking about this patient, I wouldn’t do more than 10 biopsies to find one high-grade cancer in patients with similar health and who think about the risks and benefits of biopsy vs. finding cancer in the same way. So if a patient’s risk was above 10% I do a biopsy, otherwise, I just carefully monitor the patient and perhaps do a biopsy later if I saw a reason to.”

It can be noted that the doctor uses only the prediction model, not the decision curve. Decision curve analysis is useful in research for deciding whether or not a model is clinically relevant, but the decision curve is not used in the clinic. This is similar to how a statistical p value can be used to conclude that a drug has effect, but the p value is not used in the clinic.^[1]

Drawing the decision curve diagram

To draw a decision curve diagram, a researcher should have a data material from a number of cases (patients) where the input data to the model (e.g. values from relevant blood samples) is known, and where the correct result (e.g. prostate cancer or not prostate cancer) is known.

Treat none curve

If none are treated, then there will be no positives, neither true positives nor false positives. Thus, the net benefit is zero regardless of the threshold probability, and the “treat none” curve is a horizontal line at y = 0.

Treat all curve

Treating all corresponds to a test with a positive result for all patients. In this case:

{number\ of\ true\ positives}={number\ of\ sick}

{number\ of\ false\ positives}={all\ healthy}={number\ of\ healthy}

where the number of sick persons are known from the data material. For the "treat all" strategy, the number of true and false positives are fixed numbers, but the harm ('cost') of these false negatives depends upon the threshold probability, $p_{t}$ : a high $p_{t}$ corresponds to setting a high ‘price’ on false negatives. Using the formula for net benefit, the net benefit of the “treat all” strategy can easily be calculated and plotted for a range of $p_{t}$ values.

For $p_{t}=0$ the net benefit of the “treat all” strategy will reach the maximal value of the $prevalence={number\ of\ sick}/N$ . For higher values of $p_{t}$ , the false positives will correspond to more harm. At $p_{t}=prevalence$ , net benefit will be zero, and for $p_{t}>prevalence$ the “treat all” strategy will have a negative net benefit.^[2]

Model curve

Given the data material, the model result can be calculated for each patient. In a yes/no model, positive = yes, and comparison with the reference result will allow calculation of the number of true and false positives. The model curve can then be plotted based on these results. In a model giving a probability prediction, positive output will correspond to ${probability\ prediction}>p_{t}$ . In this case, the number of both true and false positives will depend on $p_{t}$ . Again, the calculated net benefit can be plotted for all (relevant) values of $p_{t}$ .

The decision curve may be plotted for threshold probabilities from 0% to 100%. However, the relevant part of the curve is only the clinically relevant range of possible threshold values, so it makes sense to plot only this part of the curve.^[1]^[2]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l Vickers, Andrew J.; van Calster, Ben; Steyerberg, Ewout W. (2019-10-04). "A simple, step-by-step guide to interpreting decision curve analysis". Diagnostic and Prognostic Research. 3: 18. doi:10.1186/s41512-019-0064-7. ISSN 2397-7523. PMC 6777022. PMID 31592444.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ ^a ^b ^c ^d van Calster, Ben; Wynants, Laure; Verbeek, Jan F.M.; Verbakel, Jan Y.; Christodoulou, Evangelia; Vickers, Andrew J.; Roobol, Monique J.; Steyerberg, Ewout W. (December 2018). "Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators". European Urology. 74 (6): 796–804. doi:10.1016/j.eururo.2018.08.038. PMC 6261531. PMID 30241973.
^ Baker, Stuart G.; Cook, Nancy R.; Vickers, Andrew; Kramer, Barnett S. (2009-10-01). "Using relative utility curves to evaluate risk prediction". Journal of the Royal Statistical Society, Series A (Statistics in Society). 172 (4): 729–748. doi:10.1111/j.1467-985X.2009.00592.x. ISSN 0964-1998. PMC 2804257. PMID 20069131.
^ ^a ^b ^c ^d ^e Vickers, Andrew J.; Elkin, Elena B. (November 2006). "Decision curve analysis: a novel method for evaluating prediction models". Medical Decision Making. 26 (6): 565–574. doi:10.1177/0272989X06295361. ISSN 0272-989X. PMC 2577036. PMID 17099194.
^ van Calster, Ben; Vickers, Andrew J. (February 2015). "Calibration of risk prediction models: Impact on decision-analytic performance". Medical Decision Making. 35: 162–169. doi:10.1177/0272989X14547233. ISSN 0272-989X. PMID 25155798.

[Vickers_2019-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l Vickers, Andrew J.; van Calster, Ben; Steyerberg, Ewout W. (2019-10-04). "A simple, step-by-step guide to interpreting decision curve analysis". Diagnostic and Prognostic Research. 3: 18. doi:10.1186/s41512-019-0064-7. ISSN 2397-7523. PMC 6777022. PMID 31592444.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[van_Calster_2018-2] van Calster, Ben; Wynants, Laure; Verbeek, Jan F.M.; Verbakel, Jan Y.; Christodoulou, Evangelia; Vickers, Andrew J.; Roobol, Monique J.; Steyerberg, Ewout W. (December 2018). "Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators". European Urology. 74 (6): 796–804. doi:10.1016/j.eururo.2018.08.038. PMC 6261531. PMID 30241973.

[Baker_2009-3] Baker, Stuart G.; Cook, Nancy R.; Vickers, Andrew; Kramer, Barnett S. (2009-10-01). "Using relative utility curves to evaluate risk prediction". Journal of the Royal Statistical Society, Series A (Statistics in Society). 172 (4): 729–748. doi:10.1111/j.1467-985X.2009.00592.x. ISSN 0964-1998. PMC 2804257. PMID 20069131.

[Vickers_2006-4] Vickers, Andrew J.; Elkin, Elena B. (November 2006). "Decision curve analysis: a novel method for evaluating prediction models". Medical Decision Making. 26 (6): 565–574. doi:10.1177/0272989X06295361. ISSN 0272-989X. PMC 2577036. PMID 17099194.

[van_Calster_2015-5] van Calster, Ben; Vickers, Andrew J. (February 2015). "Calibration of risk prediction models: Impact on decision-analytic performance". Medical Decision Making. 35: 162–169. doi:10.1177/0272989X14547233. ISSN 0272-989X. PMID 25155798.

[1]

[2]

[3]

[4]

[5]