Draft:Combinatorial Purged Cross-Validation

Submission rejected on 10 May 2025 by Caleb Stanford (talk).

This topic is not sufficiently notable for inclusion in Wikipedia.

Rejected by Caleb Stanford 2 months ago. Last edited by Caleb Stanford 2 months ago.

Comment: [1] Caleb Stanford (talk) 20:58, 10 May 2025 (UTC)

Combinatorial Purged Cross-Validation (CPCV) is a model validation technique designed to address the specific challenges of time-series data, particularly in quantitative finance. It provides a statistically rigorous alternative to conventional cross-validation and walk-forward backtesting methods, which often yield overly optimistic performance estimates due to information leakage and overfitting.^[1]^[2]

Background

Traditional cross-validation methods, such as k-fold cross-validation, are poorly suited to financial time series due to temporal dependencies and label overlap. In these settings, the assumption that samples are independent and identically distributed (i.i.d.) is violated. Moreover, in financial modeling, labels—such as the return over a future horizon—often overlap in time, making it difficult to isolate training and test data cleanly.[2]

Walk-forward backtesting analysis, another common technique in finance, preserves temporal order but evaluates the model on a single sequence of test sets. This leads to high variance in performance estimation, as results are contingent on a specific historical path.^[1]

Combinatorial Purged Cross-Validation addresses both of these limitations by systematically constructing multiple train-test splits, purging overlapping samples, and enforcing an embargo period to prevent information leakage. The result is a distribution of out-of-sample performance estimates, enabling robust statistical inference and more realistic assessment of a model's predictive power.^[3]

Methodology

CPCV divides a time-series dataset into N sequential, non-overlapping groups. These groups preserve the temporal order of observations. Then, all combinations of k groups (where k < N) are selected as test sets, with the remaining N − k groups used for training. For each combination, the model is trained and evaluated under strict controls to prevent leakage.^[3]

To eliminate potential contamination between training and test sets, CPCV introduces two additional mechanisms:

Purging: Any training observations whose label horizon overlaps with the test period are excluded. This ensures that future information does not influence model training.
Embargoing: After the end of each test period, a fixed number of observations (typically a small percentage) are removed from the training set. This prevents leakage due to delayed market reactions or auto-correlated features.

Each data point appears in multiple test sets across different combinations. Because test groups are drawn combinatorially, this process produces multiple backtest "paths," each of which simulates a plausible market scenario. From these paths, practitioners can compute a distribution of performance statistics such as the Sharpe ratio, drawdown, or classification accuracy.

Formal definition

Let N be the number of sequential groups into which the dataset is divided, and let k be the number of groups selected as the test set for each split. Then:

The number of unique train-test combinations is given by the binomial coefficient:

{\binom {N}{k}}

Each observation is used in $k$ test sets and contributes to $\varphi [N,k]$ unique backtest paths:

\varphi [N,k]={\frac {k}{N}}{\binom {N}{k}}

This yields a distribution of performance metrics rather than a single point estimate, making it possible to apply Monte Carlo-based or probabilistic techniques to assess model robustness.

Illustrative example

Consider the case where N = 6 and k = 2. The number of possible test set combinations is ${\binom {6}{2}}=15$ . Each of the six groups appears in five test splits. Consequently, five distinct backtest paths can be constructed, each incorporating one appearance from every group.

Test group assignment matrix

This table shows the 15 test combinations. An "x" indicates that the corresponding group is included in the test set for that split.

Paths generated for *N = 6*, *k = 2*
Group	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15
G1	x	x	x	x	x
G2	x					x	x	x	x
G3		x				x				x	x	x
G4			x				x			x			x	x
G5				x				x			x		x		x
G6					x				x			x		x	x

Backtest path assignment

Each group contributes to five different backtest paths. The number in each cell indicates the path to which the group's result is assigned for that split.

Path assignments for each group
Group	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15
G1	1	2	3	4	5
G2	1					2	3	4	5
G3		1				2				3	4	5
G4			1				2			3			4	5
G5				1				2			3		4		5
G6					1				2			3		4	5

Advantages

Combinatorial Purged Cross-Validation offers several key benefits over conventional methods:

It produces a distribution of performance metrics, enabling more rigorous statistical inference.
The method systematically eliminates lookahead bias through purging and embargoing.
By simulating multiple historical scenarios, it reduces the dependence on any single market regime or realization.
It supports high-confidence comparisons between competing models or strategies.

Limitations

The main limitation of CPCV stems from its high computational cost. However, this cost can be managed by sampling a finite number of splits from the space of all possible combinations.

Applications

CPCV is commonly used in quantitative strategy research, especially for evaluating predictive models such as classifiers, regressors, and portfolio optimizers.^[4] It has been applied to estimate realistic Sharpe ratios, assess the risk of overfitting, and support the use of statistical tools such as the Deflated Sharpe Ratio.^[5]^[6]

References

^ ^a ^b Joubert, J. & Sestovic, D. & Barziy I. & Distaso, W. & Lopez de Prado, M. (2024): "Enhanced Backtesting for Practitioners." The Journal of Portfolio Management, Quantitative Tools 51(2), pp. 12 - 27. DOI: 10.3905/jpm.2024.1.637
^ Bailey, D. H., Borwein, J. M., López de Prado, M., & Zhu, Q. J. (2014): "The Probability of Backtest Overfitting." Journal of Computational Finance. 20(4).
^ ^a ^b López de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons. ISBN 978-1-119-48208-6.
^ Lopez de Prado, M. (2018): "The 10 Reasons Most Machine Learning Funds Fail." The Journal of Portfolio Management, 44(6), pp. 120 - 133. DOI: 10.3905/jpm.2018.44.6.120
^ López de Prado, M. & Zoonekynd, V. (2025):"Correcting the Factor Mirage: A Research Protocol for Causal Factor Investing." Available at SSRN: https://ssrn.com/abstract=4697929 or http://dx.doi.org/10.2139/ssrn.4697929
^ Lopez de Prado, M. (2020): Machine Learning for Asset Managers. Cambridge University Press. https://www.amazon.com/Machine-Learning-Managers-Elements-Quantitative/dp/1108792898

[JPM-1] Joubert, J. & Sestovic, D. & Barziy I. & Distaso, W. & Lopez de Prado, M. (2024): "Enhanced Backtesting for Practitioners." The Journal of Portfolio Management, Quantitative Tools 51(2), pp. 12 - 27. DOI: 10.3905/jpm.2024.1.637

[JCF-2] Bailey, D. H., Borwein, J. M., López de Prado, M., & Zhu, Q. J. (2014): "The Probability of Backtest Overfitting." Journal of Computational Finance. 20(4).

[AFML-3] López de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons. ISBN 978-1-119-48208-6.

[The10-4] Lopez de Prado, M. (2018): "The 10 Reasons Most Machine Learning Funds Fail." The Journal of Portfolio Management, 44(6), pp. 120 - 133. DOI: 10.3905/jpm.2018.44.6.120

[Factor-5] López de Prado, M. & Zoonekynd, V. (2025):"Correcting the Factor Mirage: A Research Protocol for Causal Factor Investing." Available at SSRN: https://ssrn.com/abstract=4697929 or http://dx.doi.org/10.2139/ssrn.4697929

[Cambridge-6] Lopez de Prado, M. (2020): Machine Learning for Asset Managers. Cambridge University Press. https://www.amazon.com/Machine-Learning-Managers-Elements-Quantitative/dp/1108792898

[1]

[2]

[3]

[4]

[5]

[6]