Neural architecture search
Neural architecture search (NAS) uses machine learning to automate the design of artificial neural networks. Various approaches to NAS have designed networks that compare well with hand-designed systems. The basic search algorithm is to propose a candidate model, evaluate it against a dataset and use the results as feedback to teach the NAS network.[1]
AutoML
The AutoML[1] recurrent network trained with reinforcement learning applied to the CIFAR-10 dataset yielded a network architecture that rivals the best manually-designed architecture for accuracy, with a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous model that used a similar design. On the Penn Treebank dataset, that model composed a recurrent cell that outperforms LSTM, reaching a test set perplexity of 62.4, or 3.6 perplexity better than the prior leading system. On the PTB character language modeling task it achieved perplexity of 1.214.[2]
NASNet
Learning a model architecture directly on a large dataset is a lengthy process. NASNet[3] addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of convolutional cells to return feature maps that serve two main functions when convoluting an input feature map: "Normal Cells" that return maps of the same extent (height and width) and "Reduction Cells" in which the returned feature map height and width is reduced by a factor of two. For the Reduction Cell, the initial operation applied to the cell’s inputs uses a stride of two (to reduce the height and width).[4] The learned aspect of the design included elements such as which lower layer(s) each higher layer took as input, the transformations applied at that layer and to merge multiple outputs at each layer. In the studied example, the best convolutional layer (or "cell") was designed for the CIFAR-10 dataset and then applied to the ImageNet dataset by stacking copies of this cell, each with its own parameters. The approach yielded accuracy of 82.7% top-1 and 96.2% top-5. This exceeded the best human-invented architectures at a cost of 9 billion fewer FLOPS—a reduction of 28%. The system continued to exceed the manually-designed alternative at varying computation levels. The image features learned from image classification can be transferred to other computer vision problems. E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the COCO dataset.[4]
Hill-climbing
Another group used a a hill climbing procedure that applies network morphisms, followed by short cosine-annealing optimization runs. Surprisingly, The approach yielded competitive results, requiring resources on the same order of magnitude as training a single network. E.g., on CIFAR-10, the method designed and trained a network with an error rate below 6% in 12 hours on a single GPU.[5]
Efficient Neural Architecture Search
In the so-called Efficient Neural Architecture Search (ENAS) , a controller discovers neural network architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with policy gradient to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical cross entropy loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet.On Penn Treebank, the ENAS design reached test perplexity of 55.8.[6]
References
- ^ a b Zoph, Barret; Le, Quoc V. (May 17, 2017). "Using Machine Learning to Explore Neural Network Architecture". Research Blog. Retrieved 2018-02-20.
{{cite news}}
: Cite has empty unknown parameter:|dead-url=
(help) - ^ Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural Architecture Search with Reinforcement Learning". arXiv:1611.01578 [cs].
- ^ Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (November 2, 2017). "AutoML for large scale image classification and object detection". Research Blog. Retrieved 2018-02-20.
{{cite news}}
: Cite has empty unknown parameter:|dead-url=
(help) - ^ a b Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (2017-07-21). "Learning Transferable Architectures for Scalable Image Recognition". arXiv:1707.07012 [cs, stat].
- ^ Thomas, Elsken,; Jan-Hendrik, Metzen,; Frank, Hutter, (2017-11-13). "Simple And Efficient Architecture Search for Convolutional Neural Networks".
{{cite journal}}
: Cite journal requires|journal=
(help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link) - ^ Hieu, Pham,; Y., Guan, Melody; Barret, Zoph,; V., Le, Quoc; Jeff, Dean, (2018-02-09). "Efficient Neural Architecture Search via Parameter Sharing".
{{cite journal}}
: Cite journal requires|journal=
(help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)