Stochastic optimization

Stochastic Optimization

Mathematical techniques for optimization are aimed at providing a formal means for making the best decisions in a wide range of practical problems. Given the difficulties in many real-world problems and the inherent uncertainty in information that may be available for carrying out the task, stochastic optimization have been playing a rapidly growing role.

Stochastic optimization comes about from one or both of the following properties (Spall, 2003):

A. There is random noise in the measurements of the criterion to be optimized and/or related information (such as the gradient vector of the criterion for “smooth” problems).

B. There is a random choice made in the search direction as the algorithm iterates toward a solution.

The above two properties contrast with classical deterministic search and optimization, where it is assumed that one has perfect information about the loss function (and gradients, if relevant) and that this information is used to determine the search direction in a deterministic manner at every step of the algorithm. In many practical problems, such information is not available, indicating that deterministic algorithms are inappropriate.

Property A arises in such areas as real-time estimation and control, simulation-based optimization where Monte Carlo simulations are run as estimates of an actual system (e.g., Fu, 2002), and problems where there is experimental (random) error in the measurements of the criterion. A prominent example of a class of algorithms for property A is stochastic approximation (SA) (Robbins and Monro, 1951), including stochastic gradient descent, finite-difference SA (Kiefer and Wolfowitz, 1952), and simultaneous perturbation SA (Spall, 1992; http://www.jhuapl.edu/SPSA).

Relative to the second defining property of stochastic optimization, Property B above, it is sometimes beneficial to deliberately introduce randomness into the search process as a means of speeding convergence and making the algorithm less sensitive to modeling errors. This injected (Monte Carlo) randomness is usually created via computer-based pseudorandom number generators. Further, the injected randomness may provide the necessary impetus to move away from a local solution when searching for a global optimum. Prominent examples of stochastic optimization algorithms satisfying Property B are genetic algorithms (Goldberg, 1989; http://www-illigal.ge.uiuc.edu), simulated annealing, and random search (Zhigljavsky, 1991); there are many more such algorithms.

References

Fu, M. C. (2002), “Optimization for Simulation: Theory vs. Practice” (with discussion by S. Andradóttir, P. Glynn, and J. P. Kelly), INFORMS Journal on Computing, vol. 14, pp. 192-227.
Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA.
Kiefer, J. and Wolfowitz, J. (1952), “Stochastic Estimation of a Regression Function,” Annals of Mathematical Statistics, vol. 23, pp. 462-466.
Michalewicz, Z. and Fogel, D. B. (2000), How to Solve It: Modern Heuristics, Springer-Verlag, New York.
Robbins, H. and Monro, S. (1951), “A Stochastic Approximation Method,” Annals of Mathematical Statistics, vol. 22, pp. 400-407.
Spall, J. C. (2003), Introduction to Stochastic Search and Optimization, Wiley, Hoboken, NJ.
Spall, J. C. (1992), “Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation,” IEEE Transactions on Automatic Control, vol. 37, pp. 332-341.
Zhigljavsky, A. A. (1991), Theory of Global Random Search, Kluwer Academic, Boston.