Circular analysis

Circular Analysis is the selection of parameters of an analysis using the data to be analysed. It is often referred to as double dipping, as one uses the same data twice. Circular analysis inflates the statistical strength of results and, at the most extreme can result in a strongly significant result from noise.

Examples

At its most simple, it can include the decision to remove outliers, after noticing this might help improve the analysis of an experiment. The effect can be more subtle. In fMRI data, for example, considerable amounts of pre-processing is often needed. These might be applied incrementally until the analysis 'works'. Similarly, the classifiers used in a multivoxel analysis of fMRI data require parameters, which could be tuned to maximise the classification accuracy.

Solutions

Careful design of the analysis one plans to perform, prior to collecting the data, means the analysis choice is not affected by the data collected. Alternatively, one might decide to perfect the classification on one or two participants, and then use the analysis on the remaining participant data. Regarding the selection of classification parameters, a common method is to divide the data into two sets, and find the optimum parameter using one set and then test using this parameter value on the second set. This is a standard technique used (for example) by the princeton MVPA classification library.

References

Kriegeskorte, Nikolaus, et al. "Circular analysis in systems neuroscience: the dangers of double dipping." Nature neuroscience 12.5 (2009): 535-540.