Jump to content

User:MatrixHe/sandbox

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by MatrixHe (talk | contribs) at 02:55, 3 January 2020 (Modify the introduction and main structure.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The MgNet[1] is an abstract and unified mathematical framework which simultaneously recovers some ResNet[2][3] type convolutional neural networks (CNNs) and multigrid methods[4][5] for solving discretized partial differential equations (PDEs). MgNet can be obtained by making some very minor modifications of a classic geometric multigrid method. Actually, connections between ResNet and classical multigrid methods were acknowledged in the original paper of ResNet[2] from the viewpoint how residuals are applied in both methods.  MgNet[1] makes such a connection more direct and more clear and it makes it possible to directly obtain a class of efficient CNN models by simply making some very minor modification of a typical multigrid cycle but keeping the identically same algorithm structure.

Main structure and connections with ResNet

The so-called data and feature space for CNN, which is analogous to the function space and its duality in the theory of multigrid methods[5] is introduced o examine further connections between CNN and multigrid. With this new concept for CNN, MgNet and a further research proposes the constrained data-feature mapping model in every grid as

where belongs to the data space and belongs to the feature space such that

.

The feature extraction process can then be obtained through an iterative procedure for solving the above system in each grids. For example, if we apply a single step residual correction step for the above system, we have

with .

If we consider the residual of the above iterative , we have

This is exact the basic block scheme in Pre-act ResNet[3].

Connections with other CNN architectures

The above iterative scheme can be interpreted as both the feature extraction step in ResNet type models and the smoothing step in multigrid method. Under this framework, several successful CNN architectures can be understood as different smoothing steps for example:

CNN architectures Smoothing methods in MG
ResNet[2][3] Single step residual correction
DenseNet[6] Multi-step residual correction
LM-ResNet[7] Chebyshev-semi residual correction

Summary

Different with the dynamic system viewpoint, the MgNet framework opens a new door to the mathematical understanding, analysis and improvements of deep learning models. The very preliminary results presented in [1] have demonstrated the great potential of MgNet from both theoretical and practical viewpoints. Obviously many aspects of MgNet should be further explored and expect to be much improved. In fact, only very few techniques from multigrid method have been tried in [1] and many more in-depth techniques from multigrid require further study for deep neural networks, especially CNN. In particular, it is believed that the MgNet framework will lead to improved CNN that only has a small fraction of the number of weights that are required by the current CNN. On the other hand, the techniques in CNN can also be used to develop new generation of multigrid and especially algebraic multigrid methods[5] for solving partial differential equations.

  1. ^ a b c d He, Juncai; Xu, Jinchao (2019-7). "MgNet: A unified framework of multigrid and convolutional neural network". Science China Mathematics. 62 (7): 1331–1354. doi:10.1007/s11425-019-9547-2. ISSN 1674-7283. {{cite journal}}: Check date values in: |date= (help)
  2. ^ a b c Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2015-12-10). "Deep Residual Learning for Image Recognition". {{cite journal}}: Cite journal requires |journal= (help)
  3. ^ a b c Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2016-03-16). "Identity Mappings in Deep Residual Networks". {{cite journal}}: Cite journal requires |journal= (help)
  4. ^ Xu, Jinchao. (1992-12-01). "Iterative Methods by Space Decomposition and Subspace Correction". SIAM Review. 34 (4): 581–613. doi:10.1137/1034116. ISSN 0036-1445.
  5. ^ a b c Zikatanov, Ludmil; Xu, Jinchao (2017/05). "Algebraic multigrid methods *". Acta Numerica. 26: 591–721. doi:10.1017/S0962492917000083. ISSN 0962-4929. {{cite journal}}: Check date values in: |date= (help)
  6. ^ Weinberger, Kilian Q.; van der Maaten, Laurens; Liu, Zhuang; Huang, Gao (2016-08-25). "Densely Connected Convolutional Networks". {{cite journal}}: Cite journal requires |journal= (help)
  7. ^ Dong, Bin; Li, Quanzheng; Zhong, Aoxiao; Lu, Yiping (2017-10-27). "Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations". {{cite journal}}: Cite journal requires |journal= (help)