Volcano plot (statistics)

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data.[1] It plots significance versus fold-change on the y- and x-axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a statistical test (e.g., p-value, ANOVA) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also statistically significant.
A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing towards the top of the plot. The x-axis is the log of the fold change between the two conditions. The log of the fold-change is used so that changes in both directions (up and down) appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found towards the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being towards the top).
Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a significance analysis of microarrays (SAM) gene selection criterion, an example of the regularization .[2]
The concept of volcano plot can be generalized to other applications, where the x-axis is related to a measure of the strength of a statistical signal, and y-axis is related to a measure of the statistical significance of the signal. For example, in a genetic association case-control study, such as Genome-wide association study, a point in a volcano plot represents a single-nucleotide polymorphism. Its x value can be the odds ratio and its y value can be -log10 of the p-value from Chi-square test or a Chi-square test statistic.[3]
References
- ^ Cui X, Churchill GA (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biol. 4 (4): 210. doi:10.1186/gb-2003-4-4-210. PMC 154570. PMID 12702200.
{{cite journal}}
: CS1 maint: unflagged free DOI (link) - ^ Li, W (2012). "Volcano plots in analyzing differential expression with mRNA microarrays". J. Bioinfo. and Comp. Biol. 10 (6): 1231003. doi:10.1142/S0219720012310038. PMID 23075208.
- ^
Li, W; Freudenberg, J; Suh, YJ; Yang, Y (2014). "Using volcano plots and regularized-chi statistics in
genetic association studies". Comp. Biol. and Chem. 48: 77–83. doi:10.1016/j.compbiolchem.2013.02.003. PMID 23602812.
{{cite journal}}
: line feed character in|title=
at position 54 (help)CS1 maint: multiple names: authors list (link)