Cross-entropy method

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

The method approximates the optimal importance sampling estimator by repeating two phases:^[1]

Draw a sample from a probability distribution.
Minimize the cross-entropy between this distribution and a target distribution to produce a better sample in the next iteration.

Reuven Rubinstein developed the method in the context of rare event simulation, where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the traveling salesman, quadratic assignment, DNA sequence alignment, max-cut and buffer allocation problems.

Estimation via importance sampling

Consider the general problem of estimating the quantity

$\ell =\mathbb {E} _{\mathbf {u} }[H(\mathbf {X} )]=\int H(\mathbf {x} )\,f(\mathbf {x} ;\mathbf {u} )\,{\textrm {d}}\mathbf {x}$ ,

where $H$ is some performance function and $f(\mathbf {x} ;\mathbf {u} )$ is a member of some parametric family of distributions. Using importance sampling this quantity can be estimated as

${\hat {\ell }}={\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{g(\mathbf {X} _{i})}}$ ,

where $\mathbf {X} _{1},\dots ,\mathbf {X} _{N}$ is a random sample from $g\,$ . For positive $H$ , the theoretically optimal importance sampling density (PDF) is given by

$g^{*}(\mathbf {x} )=H(\mathbf {x} )f(\mathbf {x} ;\mathbf {u} )/\ell$ .

This, however, depends on the unknown $\ell$ . The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the Kullback–Leibler sense) to the optimal PDF $g^{*}$ . Some modifications for improving the setting of parameters, convergence, and overall the computational efficiency of the cross-entropy method when dealing with multi-objective optimization problems have been introduced and reported^[2], ^[3],^[4].

Generic CE algorithm

Choose initial parameter vector $\mathbf {v} ^{(0)}$ ; set t = 1.
Generate a random sample $\mathbf {X} _{1},\dots ,\mathbf {X} _{N}$ from $f(\cdot ;\mathbf {v} ^{(t-1)})$
Solve for $\mathbf {v} ^{(t)}$ , where
$\mathbf {v} ^{(t)}=\mathop {\textrm {argmax}} _{\mathbf {u} }{\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})}}\log f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})$
If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2.

In several cases, the solution to step 3 can be found analytically. Situations in which this occurs are

When $f\,$ belongs to the natural exponential family
When $f\,$ is discrete with finite support
When $H(\mathbf {X} )=\mathrm {I} _{\{\mathbf {x} \in A\}}$ and $f(\mathbf {X} _{i};\mathbf {u} )=f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})$ , then $\mathbf {v} ^{(t)}$ corresponds to the maximum likelihood estimator based on those $\mathbf {X} _{k}\in A$ .

Continuous optimization—example

The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function $S(x)$ , for example, $S(x)={\textrm {e}}^{-(x-2)^{2}}+0.8\,{\textrm {e}}^{-(x+2)^{2}}$ . To apply CE, one considers first the associated stochastic problem of estimating $\mathbb {P} _{\boldsymbol {\theta }}(S(X)\geq \gamma )$ for a given level $\gamma \,$ , and parametric family $\left\{f(\cdot ;{\boldsymbol {\theta }})\right\}$ , for example the 1-dimensional Gaussian distribution, parameterized by its mean $\mu _{t}\,$ and variance $\sigma _{t}^{2}$ (so ${\boldsymbol {\theta }}=(\mu ,\sigma ^{2})$ here). Hence, for a given $\gamma \,$ , the goal is to find ${\boldsymbol {\theta }}$ so that $D_{\mathrm {KL} }({\textrm {I}}_{\{S(x)\geq \gamma \}}\|f_{\boldsymbol {\theta }})$ is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the sample mean and sample variance corresponding to the elite samples, which are those samples that have objective function value $\geq \gamma$ . The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm. Some recent applications of the cross-entropy optimization method have been reported to solve the dynamic economic dispatch problem with a unit start-stop plan^[5], parametrization of micromilling processes^[6], energy scheduling problems^[7] and robotic automated storage and retrieval system^[8].

Pseudo-code

// Initialize parameters
mu:=-6
sigma2:=100
t:=0
maxits:=100
N:=100
Ne:=10
// While maxits not exceeded and not converged
while t < maxits and sigma2 > epsilon
  // Obtain N samples from current sampling distribution
  X:=SampleGaussian(mu,sigma2,N)
  // Evaluate objective function at sampled points
  S:=exp(-(X-2)^2) + 0.8 exp(-(X+2)^2)
  // Sort X by objective function values in descending order
  X:=sort(X,S)
  // Update parameters of sampling distribution                  
  mu:=mean(X(1:Ne))
  sigma2:=var(X(1:Ne))
  t:=t+1
// Return mean of final sampling distribution as solution
return mu

Related methods

Journal Papers

De Boer, P-T., Kroese, D.P, Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134 (1), 19–67.[1]
Rubinstein, R.Y. (1997). Optimization of Computer simulation Models with Rare Events, European Journal of Operational Research, 99, 89–112.

Software Implementations

CEoptim R package

References

^ Rubinstein, R.Y. and Kroese, D.P. (2004), The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Springer-Verlag, New York ISBN 978-0-387-21240-1.
^ Bekker, J.; Aldrich, C. (2011). "The cross-entropy method in multi-objective optimisation: An assessment". European Journal of Operational Research. 211 (1): 112–121. doi:10.1016/j.ejor.2010.10.028.
^ Giagkiozis, I.; Purshouse, R.C.; Fleming, P.J. (2014). "Generalized decomposition and cross entropy methods for many-objective optimization". Information Sciences. 282: 363–387. doi:10.1016/j.ins.2014.05.045.
^ Haber, R.E.; Beruvides, G.; Quiza, R.; Hernandez, A. (2017). "A simple multi-objective optimization based on the cross-entropy method". A simple multi-objective optimization based on the cross-entropy method. 5: 22272–22281. doi:10.1109/access.2017.2764047.
^ Xie, M.; Du, Y.; Wei, W.; Liu, M. (2019). "A cross-entropy-based hybrid membrane computing method for power system unit commitment problems". Energies. 12 (3). doi:10.3390/en12030486.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ La Fe, I.; Beruvides, G.; Quiza, R.; Haber, R.E.; Rivas, M. (2019). "Automatic Selection of Optimal Parameters Based on Simple Soft-Computing Methods: A Case Study of Micromilling Processes". IEEE Transactions on Industrial Informatics. 15 (2): 800–811. doi:10.1109/tii.2018.2816971.
^ Wang, L.; Li, Q.; Zhang, B.; DIng, R.; Sun, M. (2019). "Robust multi-objective optimization for energy production scheduling in microgrids". Engineering Optimization. 51 (2): 332–351. doi:10.1080/0305215x.2018.1457655.
^ Foumani, M.; Moeini, A.; Haythorpe, M.; Smith-Miles, K. (2019). "A cross-entropy method for optimising robotic automated storage and retrieval systems". International Journal of Production Research. 56 (19): 6450–6472. doi:10.1080/00207543.2018.1456692.

[1] Rubinstein, R.Y. and Kroese, D.P. (2004), The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Springer-Verlag, New York ISBN 978-0-387-21240-1.

[2] Bekker, J.; Aldrich, C. (2011). "The cross-entropy method in multi-objective optimisation: An assessment". European Journal of Operational Research. 211 (1): 112–121. doi:10.1016/j.ejor.2010.10.028.

[3] Giagkiozis, I.; Purshouse, R.C.; Fleming, P.J. (2014). "Generalized decomposition and cross entropy methods for many-objective optimization". Information Sciences. 282: 363–387. doi:10.1016/j.ins.2014.05.045.

[4] Haber, R.E.; Beruvides, G.; Quiza, R.; Hernandez, A. (2017). "A simple multi-objective optimization based on the cross-entropy method". A simple multi-objective optimization based on the cross-entropy method. 5: 22272–22281. doi:10.1109/access.2017.2764047.

[5] Xie, M.; Du, Y.; Wei, W.; Liu, M. (2019). "A cross-entropy-based hybrid membrane computing method for power system unit commitment problems". Energies. 12 (3). doi:10.3390/en12030486.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[6] La Fe, I.; Beruvides, G.; Quiza, R.; Haber, R.E.; Rivas, M. (2019). "Automatic Selection of Optimal Parameters Based on Simple Soft-Computing Methods: A Case Study of Micromilling Processes". IEEE Transactions on Industrial Informatics. 15 (2): 800–811. doi:10.1109/tii.2018.2816971.

[7] Wang, L.; Li, Q.; Zhang, B.; DIng, R.; Sun, M. (2019). "Robust multi-objective optimization for energy production scheduling in microgrids". Engineering Optimization. 51 (2): 332–351. doi:10.1080/0305215x.2018.1457655.

[8] Foumani, M.; Moeini, A.; Haythorpe, M.; Smith-Miles, K. (2019). "A cross-entropy method for optimising robotic automated storage and retrieval systems". International Journal of Production Research. 56 (19): 6450–6472. doi:10.1080/00207543.2018.1456692.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]