Jump to content

Volcano plot (statistics)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 148.177.1.211 (talk) at 15:28, 13 October 2014 (spelling ~~~~). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Volcano plot showing metabolomic data. The red arrows indicate points-of-interest that display both large-magnitude fold-changes (x-axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data.[1] It plots significance versus fold-change on the y- and x-axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a p-value from an ANOVA model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also statistically significant.

A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The x-axis is the log of the fold change between the two conditions. The log of the fold-change is used so that changes in both directions (up and down) appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being toward the top).

Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a significance analysis of microarrays (SAM) gene selection criterion, an example of the regularization .[2]

The concept of volcano plot can be generalized to other applications, where the x-axis is related to a measure of the strength of a statistical signal, and y-axis is related to a measure of the statistical significance of the signal. For example, in a genetic association case-control study, such as Genome-wide association study, a point in a volcano plot represents a single-nucleotide polymorphism. Its x value can be the odds ratio and its y value can be -log10 of the p-value from Chi-square test or a Chi-square test statistic.[3]

References

  1. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1186/gb-2003-4-4-210, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1186/gb-2003-4-4-210 instead.
  2. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1142/S0219720012310038, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1142/S0219720012310038 instead.
  3. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/j.compbiolchem.2013.02.003, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/j.compbiolchem.2013.02.003 instead.