Jump to content

User:MatrixHe/sandbox

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by MatrixHe (talk | contribs) at 02:16, 3 January 2020. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The MgNet[1] is an abstract and unified mathematical framework proposed by Juncai He and Jinchao Xu that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial differential equations (PDEs). This model is based on close connections that we have observed and uncovered between the CNN and MG methodologies. For example, pooling operation and feature extraction in CNN correspond directly to restriction operation and iterative smoothers in MG, respectively. As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood. As a result, modified CNN models (with fewer weights and hyperparameters) are developed that exhibit competitive and sometimes better performance in comparison with existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.

Main structure and connections with ResNet

The so-called data and feature space for CNN, which is analogous to the function space and its duality in the theory of multigrid methods[2] is introduced o examine further connections between CNN and multigrid. With this new concept for CNN, MgNet and a further research proposes the constrained data-feature mapping model in every grid as

where belongs to the data space and belongs to the feature space such that

.

The feature extraction process can then be obtained through an iterative procedure for solving the above system in each grids. For example, if we apply a single step residual correction step for the above system, we have

with .

If we consider the residual of the above iterative , we have

This is exact the basic block scheme in Pre-act ResNet[3]. For further connections, if

Connections with other CNN architectures

The above iterative scheme can be interpreted as both the feature extraction step in ResNet type models and the smoothing step in multigrid method. Under this framework, several successful CNN architectures can be understood as different smoothing steps for example:

CNN architectures Smoothing methods in MG
ResNet[4][3] Single step residual correction
DenseNet[5] Multi-step residual correction
LM-ResNet[6] Chebyshev-semi residual correction

Summary

Different with the dynamic system viewpoint, the MgNet framework opens a new door to the mathematical understanding, analysis and improvements of deep learning models. The very preliminary results presented in [1] have demonstrated the great potential of MgNet from both theoretical and practical viewpoints. Obviously many aspects of MgNet should be further explored and expect to be much improved. In fact, only very few techniques from multigrid method have been tried in [1] and many more in-depth techniques from multigrid require further study for deep neural networks, especially CNN. In particular, it is believed that the MgNet framework will lead to improved CNN that only has a small fraction of the number of weights that are required by the current CNN. On the other hand, the techniques in CNN can also be used to develop new generation of multigrid and especially algebraic multigrid methods[2] for solving partial differential equations.

  1. ^ a b c He, Juncai; Xu, Jinchao (2019-7). "MgNet: A unified framework of multigrid and convolutional neural network". Science China Mathematics. 62 (7): 1331–1354. doi:10.1007/s11425-019-9547-2. ISSN 1674-7283. {{cite journal}}: Check date values in: |date= (help)
  2. ^ a b Zikatanov, Ludmil; Xu, Jinchao (2017/05). "Algebraic multigrid methods *". Acta Numerica. 26: 591–721. doi:10.1017/S0962492917000083. ISSN 0962-4929. {{cite journal}}: Check date values in: |date= (help)
  3. ^ a b Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2016-03-16). "Identity Mappings in Deep Residual Networks". {{cite journal}}: Cite journal requires |journal= (help)
  4. ^ Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2015-12-10). "Deep Residual Learning for Image Recognition". {{cite journal}}: Cite journal requires |journal= (help)
  5. ^ Weinberger, Kilian Q.; van der Maaten, Laurens; Liu, Zhuang; Huang, Gao (2016-08-25). "Densely Connected Convolutional Networks". {{cite journal}}: Cite journal requires |journal= (help)
  6. ^ Dong, Bin; Li, Quanzheng; Zhong, Aoxiao; Lu, Yiping (2017-10-27). "Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations". {{cite journal}}: Cite journal requires |journal= (help)