Jump to content

Sampling error

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Euan Richard (talk | contribs) at 12:16, 24 January 2021 (Separate secondary definition, remove fluff). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample (often known as estimators), such as means and quartiles, generally differ from the statistics of the entire population (known as parameters). The difference between the sample statistic and population parameter is considered the sampling error.[1] For example, if one measures the height of a thousand individuals from a population of one million, the average height of the thousand is typically not the same as the average height of all one million people in the country.

Since sampling is almost always done to estimate population parameters that are unknown, by definition exact measurement of the sampling errors will not be possible; however they can often be estimated, either by general methods such as bootstrapping, or by specific methods incorporating some assumptions (or guesses) regarding the true population distribution and parameters thereof.

Description

Random sampling

In statistics, sampling error is the error caused by observing a sample instead of the whole population.[1] The sampling error is the difference between a sample statistic used to estimate a population parameter and the actual but unknown value of the parameter.[2] An estimate of a quantity of interest, such as an average or percentage, will generally be subject to sample-to-sample variation.[1] These variations in the possible sample values of a statistic can theoretically be expressed as sampling errors (a type of bootstrapping), although in practice the exact sampling error is typically unknown. Sampling error also refers more broadly to this phenomenon of random sampling variation.

Random sampling, and its derived terms such as sampling error, simply specific procedures for gathering and analyzing data that are rigorously applied as a method for arriving at results considered representative of a given population as a whole. Despite a common misunderstanding, "random" does not mean the same thing as "chance" as this idea is often used in describing situations of uncertainty, nor is it the same as projections based on an assessed probability or frequency. Sampling always refers to a procedure of gathering data from a small aggregation of individuals that is purportedly representative of a larger grouping which must in principle be capable of being measured as a totality. Random sampling is used precisely to ensure a truly representative sample from which to draw conclusions, in which the same results would be arrived at if one had included the entirety of the population instead. Random sampling (and sampling error) can only be used to gather information about a single defined point in time. If additional data is gathered (other things remaining constant) then comparison across time periods may be possible. However, this comparison is distinct from any sampling itself. As a method for gathering data within the field of statistics, random sampling is recognized as clearly distinct from the causal process that one is trying to measure. The conducting of research itself may lead to certain outcomes affecting the researched group, but this effect is not what is called sampling error. Sampling error always refers to the recognized limitations of any supposedly representative sample population in reflecting the larger totality, and the error refers only to the discrepancy that may result from judging the whole on the basis of a much smaller number. This is only an "error" in the sense that it would automatically be corrected if the totality were itself assessed. The term has no real meaning outside of statistics.


Reducing Sampling Error

The likely size of the sampling error can generally be reduced by taking a large enough random sample from the population,[3] although the cost of doing this in reality may be prohibitive. Since the sample error can often be estimated beforehand as a function of the sample size, various methods of sample size determination are used to weigh the predicted accuracy of an estimator against the predicted cost of taking a larger sample.

Sampling Bias

Sampling bias can dramatically increase the sample error in a systematic way, occurring when the sample is chosen in a way that makes some individuals less likely to be included in the sample than others. For example, attempting to measure the average height of the entire human population, but measuring a sample only from a specific country, can result in a large over- or under-estimation.

In Genetics

The term "sampling error" has also been used in a related but fundamentally different sense in the field of genetics; for example in the bottleneck effect or founder effect, when natural disasters or migrations dramatically reduce the size of a population, resulting in a smaller population that may or may not fairly represent the original one. This is a source of genetic drift, as certain alleles become more or less common), and has been referred to as "sampling error"[4], despite not being an "error" in the statistical sense.

See also

References

  1. ^ a b c Sarndal, Swenson, and Wretman (1992), Model Assisted Survey Sampling, Springer-Verlag, ISBN 0-387-40620-4
  2. ^ Burns, N.; Grove, S. K. (2009). The Practice of Nursing Research: Appraisal, Synthesis, and Generation of Evidence (6th ed.). St. Louis, MO: Saunders Elsevier. ISBN 978-1-4557-0736-2.
  3. ^ Scheuren, Fritz (2005). "What is a Margin of Error?". What is a Survey? (PDF). Washington, D.C.: American Statistical Association. Retrieved 2008-01-08.
  4. ^ Campbell, Neil A.; Reece, Jane B. (2002). Biology. Benjamin Cummings. pp. 450–451. ISBN 0-536-68045-0.