Jump to content

Contingency table

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Seglea (talk | contribs) at 18:09, 27 August 2004 (basic facts and links to tests). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In statistics, contingency tables are used to record and analyse the relationship between two or more variables, most usually categorical variables.

Suppose that we have two variables, gender (male or female) and handedness (right-handed or left-handed). We observe the values of both variables in a random sample of 100 people. Then a contingency table can be used to express the relationship between these two variables, as follows:

malefemaleTOTAL
right-handed43952
left-handed44448
TOTAL8713100

The figures in the right-hand column and the bottom row are called marginal totals and the figure in the bottom right-hand corner is the grand total. The table allows us to see at a glance that the proportion of men who are right-handed is about the same as the proportion of women who are. However the two proportions are not identical, and the statistical significance of the difference between them can be tested with a Pearson's chi-square test, a G-test or Fisher's exact test.

If the proportions of individuals in the different columns varies between rows (and, therefore, vice versa) we say that the table shows contingency between the two variables. If there is no contingency, we say that the two variables are independent.

The example above is for the simplest kind of contingency table, in which each variable has only two levels; this is called a 2 x 2 contingency table. In principle, any number of rows and columns may be used. There may also be more than two variables, but higher order contingency tables are hard to represent on paper. The relationship between ordinal variables, or between ordinal and categorical variables, may also be represented in contingency tables, though this is less often done since the distributions of such variables can be summarised efficiently by the median.