Jump to content

Volcano plot (statistics)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Roadnottaken (talk | contribs) at 15:45, 4 January 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Volcano plot showing metabolomic data. The red arrows indicate points-of-interest that display both large-magnitude fold-changes (x-axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data [1]. It plots significance versus fold-change on the y- and x-axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a statistical test (e.g., p-value, ANOVA) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc) that display large-magnitude changes that are also statistically significant.

A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing towards the top of the plot. The x-axis is the log of the fold change between the two conditions. The log of the fold-change is used so that changes in both directions (up and down) appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found towards the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being towards the top).

Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed.

References

  1. ^ Cui X, Churchill GA (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biol. 4 (4): 210. PMC 154570. PMID 12702200.