Jump to content

Data generating process

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Marcocapelle (talk | contribs) at 18:32, 20 May 2016 (removed Category:Statistical terminology; added Category:Statistical models using HotCat). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The term data generating process is used in statistical and scientific literature to convey a number of different ideas:

  • the data collection process, being routes and procedures by which data reach a database (particularly where these may change over time);
  • a specific statistical model that is being used to represent supposed random variations in observations, often in terms of explanatory and/or latent variables
  • a notional and non-specific probabilistic model (not directly described or explicitly set down) that would include all of the random influences that combine together to lead to individual observations, where one instance would be the supposed justification of the "common occurrence" of the normal distribution in terms of a combination of multiple random additive effects.