Rayleigh–Ritz method

The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalue, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.

The name Rayleigh–Ritz is being debated^[1] ^[2] vs. the Ritz method after Walther Ritz, since the numerical procedure has been published by Walther Ritz in 1908-1909. According to,^[1] Lord Rayleigh wrote a paper congratulating Ritz on his work in 1911, but stating that he himself had used Ritz's method in many places in his book and in another publication. This statement, although later disputed, and the fact that the method in the trivial case of a single vector results in the Rayleigh quotient make the arguable misnomer persist. According to,^[2] citing Richard Courant, both Lord Rayleigh and Walther Ritz independently conceived the idea of utilizing the equivalence between boundary value problems of partial differential equations on the one hand and problems of the calculus of variations on the other hand for numerical calculation of the solutions, by substituting for the variational problems simpler approximating extremum problems in which a finite number of parameters need to be determined. Ironically for the debate, the modern justification of the algorithm drops the calculus of variations in favor of the simpler and more general approach of orthogonal projection as in Galerkin method named after Boris Galerkin, thus leading also to the Ritz-Galerkin method naming.

It is used in all applications that involve approximating eigenvalues and eigenvectors, often under different names. In quantum mechanics, where a system of particles is described using a Hamiltonian, the Ritz method uses trial wave functions to approximate the ground state eigenfunction with the lowest energy. In the finite element method context, mathematically the same algorithm is commonly called the Ritz-Galerkin method. The Rayleigh–Ritz method or Ritz method terminology is typical in mechanical and structural engineering to approximate the eigenmodes and resonant frequencies of a structure.

For matrix eigenvalue problems

In numerical linear algebra, the Rayleigh–Ritz method is commonly^[3] applied to approximate an eigenvalue problem

A{\textbf {x}}=\lambda {\textbf {x}}

for the matrix $A\in \mathbb {C} ^{N\times N}$ of size $N$ using a projected matrix of a smaller size $m<N$ , generated from a given matrix $V\in \mathbb {C} ^{N\times m}$ with orthonormal columns. The matrix version of the algorithm is the most simple:

Compute the $m\times m$ matrix $V^{*}AV$ , where $V^{*}$ denotes the complex-conjugate transpose of $V$
Solve the eigenvalue problem $V^{*}AV\mathbf {y} _{i}=\mu _{i}\mathbf {y} _{i}$
Compute the Ritz vectors ${\tilde {\textbf {x}}}_{i}=V{\textbf {y}}_{i}$ and the Ritz value ${\tilde {\lambda }}_{i}=\mu _{i}$
Output approximations $({\tilde {\lambda }}_{i},{\tilde {\textbf {x}}}_{i})$ , called the Ritz pairs, to eigenvalues and eigenvectors of the original matrix $A$

If the subspace with the orthonormal basis given by the columns of the matrix $V\in \mathbb {C} ^{N\times m}$ contains $k\leq m$ vectors that are close to eigenvectors of the matrix $A$ , the Rayleigh–Ritz method above finds $k$ Ritz vectors that well approximate these eigenvectors. The easily computable quantity $\|A{\tilde {\textbf {x}}}_{i}-{\tilde {\lambda }}_{i}{\tilde {\textbf {x}}}_{i}\|$ determines the accuracy of such an approximation for every Ritz pair.

In the easiest case $m=1$ , the $N\times m$ matrix $V$ turns into a unit column-vector $v$ , the $m\times m$ matrix $V^{*}AV$ is a scalar that is equal to the Rayleigh quotient $\rho (v)=v^{*}Av/v^{*}v$ , the only $i=1$ solution to the eigenvalue problem is $y_{i}=1$ and $\mu _{i}=\rho (v)$ , and the only one Ritz vector is $v$ itself. Thus, the Rayleigh–Ritz method turns into computing of the Rayleigh quotient if $m=1$ .

Another useful connection to the Rayleigh quotient is that $\mu _{i}=\rho (v_{i})$ for every Ritz pair $({\tilde {\lambda }}_{i},{\tilde {\textbf {x}}}_{i})$ , allowing to derive some properties of Ritz values $\mu _{i}$ from the corresponding theory for the Rayleigh quotient. For example, if $A$ is a Hermitian matrix, its Rayleigh quotient (and thus its every Ritz value) is real and takes values within the closed interval of the smallest and largest eigenvalues of $A$ .

Example

The matrix

A={\begin{bmatrix}2&0&0\\0&2&1\\0&1&2\end{bmatrix}}

has eigenvalues $1,2,3$ and the corresponding eigenvectors

\mathbf {x} _{\lambda =1}={\begin{bmatrix}0\\1\\-1\end{bmatrix}},\quad \mathbf {x} _{\lambda =2}={\begin{bmatrix}1\\0\\0\end{bmatrix}},\quad \mathbf {x} _{\lambda =3}={\begin{bmatrix}0\\1\\1\end{bmatrix}}.

Let us take

V={\begin{bmatrix}0&0\\1&0\\0&1\end{bmatrix}},

then

V^{*}AV={\begin{bmatrix}2&1\\1&2\end{bmatrix}}

with eigenvalues $1,3$ and the corresponding eigenvectors

\mathbf {y} _{\mu =1}={\begin{bmatrix}1\\-1\end{bmatrix}},\quad \mathbf {y} _{\mu =3}={\begin{bmatrix}1\\1\end{bmatrix}},

so that the Ritz values are $1,3$ and the Ritz vectors are

\mathbf {\tilde {x}} _{{\tilde {\lambda }}=1}={\begin{bmatrix}0\\1\\-1\end{bmatrix}},\quad \mathbf {\tilde {x}} _{{\tilde {\lambda }}=3}={\begin{bmatrix}0\\1\\1\end{bmatrix}}.

We observe that each one of the Ritz vectors is exactly one of the eigenvectors of $A$ for the given $V$ as well as the Ritz values give exactly two of the three eigenvalues of $A$ . A mathematical explanation for the exact approximation is based on the fact that the column space of the matrix $V$ happens to be exactly the same as the subspace spanned by the two eigenvectors $\mathbf {x} _{\lambda =1}$ and $\mathbf {x} _{\lambda =3}$ in this example.

For matrix singular value problems

Truncated singular value decomposition (SVD) in numerical linear algebra can also use the Rayleigh–Ritz method to find approximations to left and right singular vectors of the matrix $M\in \mathbb {C} ^{M\times N}$ of size $M$ -by- $N$ in given subspaces by turning the singular value problem into an eigenvalue problem.

Using the normal matrix

The definition of the singular value $\sigma$ and the corresponding left and right singular vectors is $Mv=\sigma u$ and $M^{*}u=\sigma v$ . Having found one set (left of right) of approximate singular vectors and singular values by applying naively the Rayleigh–Ritz method to the Hermitian normal matrix $M^{*}M\in \mathbb {C} ^{N\times N}$ or $MM^{*}\in \mathbb {C} ^{M\times M}$ , whichever one is smaller size, one could determine the other set of left of right singular vectors simply by dividing by the singular values. However, the division is unstable or fails for small or zero singular values.

An alternative approach, e.g., defining the normal matrix as $A=M^{*}M\in \mathbb {C} ^{N\times N}$ of size $N$ -by- $N$ , takes advantage of the fact that for a given $N$ -by- $m$ matrix $W\in \mathbb {C} ^{N\times m}$ with orthonormal columns the eigenvalue problem of the Rayleigh–Ritz method for the $m$ -by- $m$ matrix

W^{*}AW=W^{*}M^{*}MW=(MW)^{*}MW

can be interpreted as a singular value problem for the $N$ -by- $m$ matrix $MW$ . This interpretation allows simple simultaneous calculation of both left and right approximate singular vectors as follows.

Compute the $N\times m$ matrix $MW$ .
Compute the thin, or economy-sized, SVD $MW=\mathbf {U} {\Sigma }\mathbf {V} _{h},$ with $N$ -by- $m$ matrix $\mathbf {U}$ , $m$ -by- $m$ diagonal matrix ${\Sigma }$ , and $m$ -by- $m$ matrix $\mathbf {V} _{h}$ .
Compute the matrices of the Ritz left $U=\mathbf {U}$ and right $V_{h}=\mathbf {V} _{h}W^{*}$ singular vectors.
Output approximations $U,\Sigma ,V_{h}$ , called the Ritz singular triplets, to selected singular values and the corresponding left and right singular vectors of the original matrix $M$ representing an approximate Truncated singular value decomposition (SVD) with left singular vectors restricted to the column-space of the matrix $W$ .

The algorithm can be used as a post-processing step where the matrix $W$ is an output of an eigenvalue solver, e.g., such as LOBPCG, approximating numerically selected eigenvectors of the normal matrix $A=M^{*}M$ .

Example

The matrix

M={\begin{bmatrix}1&0&0&0\\0&2&0&0\\0&0&3&0\\0&0&0&4\\0&0&0&0\end{bmatrix}}

has its normal matrix

A=M^{*}M={\begin{bmatrix}1&0&0&0\\0&4&0&0\\0&0&9&0\\0&0&0&16\\\end{bmatrix}}

,

singular values $1,2,3,4$ and the corresponding thin SVD

A={\begin{bmatrix}0&0&0&1\\0&0&1&0\\0&1&0&0\\1&0&0&0\\0&0&0&0\end{bmatrix}}{\begin{bmatrix}4&0&0&0\\0&3&0&0\\0&0&2&0\\0&0&0&1\end{bmatrix}}{\begin{bmatrix}0&0&0&1\\0&0&1&0\\0&1&0&0\\1&0&0&0\end{bmatrix}}

Let us take

W={\begin{bmatrix}{\sqrt {2}}/2&{\sqrt {2}}/2\\{\sqrt {2}}/2&-{\sqrt {2}}/2\\0&0\\0&0\end{bmatrix}}.

Following the algorithm step 1, we compute

MW={\begin{bmatrix}{\sqrt {2}}/2&{\sqrt {2}}/2\\{\sqrt {2}}&-{\sqrt {2}}\\0&0\\0&0\end{bmatrix}},

and on step 2 its thin SVD $MW=\mathbf {U} {\Sigma }\mathbf {V} _{h}$ with

\mathbf {U} ={\begin{bmatrix}0&1\\1&0\\0&0\\0&0\\0&0\end{bmatrix}},\quad \Sigma ={\begin{bmatrix}2&0\\0&1\end{bmatrix}},\quad \mathbf {V} _{h}={\begin{bmatrix}{\sqrt {2}}/2&-{\sqrt {2}}/2\\{\sqrt {2}}/2&{\sqrt {2}}/2\end{bmatrix}}.

Thus we already obtain the singular values 2 and 1 from $\Sigma$ and from $\mathbf {U}$ the corresponding two left singular vectors $u$ as $[0,1,0,0,0]^{*}$ and $[1,0,0,0,0]^{*}$ , which span the column-space of the matrix $W$ , explaining why the approximations are exact for the given $W$ .

Finally, step 3 computes the matrix $V_{h}=\mathbf {V} _{h}W^{*}$

\mathbf {V} _{h}={\begin{bmatrix}{\sqrt {2}}/2&-{\sqrt {2}}/2\\{\sqrt {2}}/2&{\sqrt {2}}/2\end{bmatrix}}\,{\begin{bmatrix}{\sqrt {2}}/2&{\sqrt {2}}/2&0&0\\{\sqrt {2}}/2&-{\sqrt {2}}/2&0&0\end{bmatrix}}={\begin{bmatrix}0&1&0&0\\1&0&0&0\end{bmatrix}}

recovering from its rows the two right singular vectors $v$ as $[0,1,0,0]^{*}$ and $[1,0,0,0]^{*}$ . We validate $Mv=\sigma u$

{\begin{bmatrix}1&0&0&0\\0&2&0&0\\0&0&3&0\\0&0&0&4\\0&0&0&0\end{bmatrix}}\,{\begin{bmatrix}0\\1\\0\\0\end{bmatrix}}=\,2\,{\begin{bmatrix}0\\1\\0\\0\\0\end{bmatrix}}

and $M^{*}u=\sigma v$

{\begin{bmatrix}1&0&0&0&0\\0&2&0&0&0\\0&0&3&0&0\\0&0&0&4&0\end{bmatrix}}\,{\begin{bmatrix}0\\1\\0\\0\\0\end{bmatrix}}=\,2\,{\begin{bmatrix}0\\1\\0\\0\end{bmatrix}}.

Thus, for the given matrix $W$ with its column-space that is spanned by two exact left singular vectors, we determine these left singular vectors, as well as the corresponding rights singular vectors and the singular values, all exactly. For an arbitrary matrix $W$ , we obtain approximate singular triplets which are optimal given $W$ in the sense of optimality of the Rayleigh–Ritz method.

Derivation from calculus of variations

Using this technique, we approximate the variational problem and end up with a finite dimensional problem. So let us start with the problem of seeking a function $y(x)$ that extremizes an integral $I[y(x)]$ . Assume that we are able to approximate $y(x)$ by a linear combination of linearly independent functions of the type,

$y(x)\approx \varphi _{0}(x)+c_{1}\varphi _{1}(x)+c_{2}\varphi _{2}(x)+\cdots +c_{N}\varphi _{N}(x)$

where $c_{1},c_{2},\cdots ,c_{N}$ are constants to be determined by a variational method - such as the one described below.

The selection of which approximating functions $\varphi _{i}(x)$ to use is arbitrary except for the following considerations:

a) If the problem has boundary conditions such as fixed end points, then $\varphi _{0}(x)$ is chosen to satisfy the problem’s boundary conditions, and all other $\varphi _{i}(x)$ vanish at the boundary.

b) If the form of the solution is known, then $\varphi _{i}(x)$ can be chosen so that $y(x)$ will have that form.

The expansion of $y(x)$ in terms of approximating functions replaces the variational problem of extremising the functional integral $I[y(x)]$ to a problem of finding a set of constants $c_{1},c_{2},\cdots ,c_{N}$ that extremizes $I(c_{1},c_{2},\cdots ,c_{N})$ . We can now solve this by setting the partial derivatives to zero. For each value of $i$ ,

${\partial I \over \partial c_{i}}=0$

The procedure is to first determine an initial estimate of $c_{1}$ by the approximation $y(x)\approx \varphi _{0}(x)+c_{1}\varphi _{1}(x)$ . Next, the approximation $y(x)\approx \varphi _{0}(x)+c_{1}\varphi _{1}(x)+c_{2}\varphi _{2}(x)$ is used (with $c_{1}$ being redetermined). The process continues with $y(x)\approx \varphi _{0}(x)+c_{1}\varphi _{1}(x)+c_{2}\varphi _{2}(x)+c_{3}\varphi _{3}(x)$ as the third approximation and so on. At each stage the following two items are true:

At the $i^{th}$ stage, the terms $c_{1},\cdots ,c_{i-1}$ are redetermined
The approximation at the $i^{th}$ stage $y(x)\approx \varphi _{0}(x)+c_{1}\varphi _{1}(x)+\cdots +c_{i}\varphi _{i}(x)$ will be no worse than the approximation at the $(i-1)^{th}$ stage

Convergence of the procedure means that as $i$ tends to infinity, the approximation will tend towards the exact function $y(x)$ that extremizes an integral $I[y(x)]$ .

In many cases one uses a complete set of functions e.g. polynomials or sines and cosines. A set of functions $\varphi _{i}(x)$ is called complete over $[a,b]$ if for each Riemann integrable function $f(x)$ , there is a set of values of coefficients $c_{1},c_{2},\cdots ,c_{N}$ that reproduces $f(x)$ .

The above outlined procedure can be extended to cases with more than one independent variable.

Applications in mechanical engineering

The Rayleigh–Ritz method is often used in mechanical engineering for finding the approximate real resonant frequencies of multi degree of freedom systems, such as spring mass systems or flywheels on a shaft with varying cross section. It is an extension of Rayleigh's method. It can also be used for finding buckling loads and post-buckling behaviour for columns.

Consider the case whereby we want to find the resonant frequency of oscillation of a system. First, write the oscillation in the form,

$y(x,t)=Y(x)\cos \omega t$

with an unknown mode shape $Y(x)$ . Next, find the total energy of the system, consisting of a kinetic energy term and a potential energy term. The kinetic energy term involves the square of the time derivative of $y(x,t)$ and thus gains a factor of $\omega ^{2}$ . Thus, we can calculate the total energy of the system and express it in the following form:

$E=T+V\equiv A[Y(x)]\omega ^{2}\sin ^{2}\omega t+B[Y(x)]\cos ^{2}\omega t$

By conservation of energy, the average kinetic energy must be equal to the average potential energy. Thus,

$\omega ^{2}={\frac {B[Y(x)]}{A[Y(x)]}}=R[Y(x)]$

which is also known as the Rayleigh quotient. Thus, if we knew the mode shape $Y(x)$ , we would be able to calculate $A[Y(x)]$ and $B[Y(x)]$ , and in turn get the eigenfrequency. However, we do not yet know the mode shape. In order to find this, we can approximate $Y(x)$ as a combination of a few approximating functions $Y_{i}(x)$

$Y(x)=\sum _{i=1}^{N}c_{i}Y_{i}(x)$

where $c_{1},c_{2},\cdots ,c_{N}$ are constants to be determined. In general, if we choose a random set of $c_{1},c_{2},\cdots ,c_{N}$ , it will describe a superposition of the actual eigenmodes of the system. However, if we seek $c_{1},c_{2},\cdots ,c_{N}$ such that the eigenfrequency $\omega ^{2}$ is minimised, then the mode described by this set of $c_{1},c_{2},\cdots ,c_{N}$ will be close to the lowest possible actual eigenmode of the system. Thus, this finds the lowest eigenfrequency. If we find eigenmodes orthogonal to this approximated lowest eigenmode, we can approximately find the next few eigenfrequencies as well.

In general, we can express $A[Y(x)]$ and $B[Y(x)]$ as a collection of terms quadratic in the coefficients $c_{i}$ :

$B[Y(x)]=\sum _{i}\sum _{j}c_{i}c_{j}K_{ij}={\bf {c^{T}Kc}}$

$A[Y(x)]=\sum _{i}\sum _{j}c_{i}c_{j}M_{ij}={\bf {c^{T}Mc}}$

where $K$ and $M$ are the stiffness matrix and mass matrix of a discrete system respectively.

The minimization of $\omega ^{2}$ becomes:

${\partial \omega ^{2} \over \partial c_{i}}={\partial \over \partial c_{i}}{\frac {\bf {c^{T}Kc}}{\bf {c^{T}Mc}}}=0$

Solving this,

${\bf {{c^{T}Mc}{\partial {\bf {c^{T}Kc}} \over \partial c}-{\bf {{c^{T}Kc}{\partial {\bf {c^{T}Mc}} \over \partial c}=0}}}}$

${\bf {{Kc}-{\frac {\bf {c^{T}Kc}}{\bf {c^{T}Mc}}}{\bf {{Mc}=0}}}}$

${\bf {{Kc}-\omega ^{2}{\bf {{Mc}=0}}}}$

For a non-trivial solution of c, we require determinant of the matrix coefficient of c to be zero.

$\det({\bf {{K}-\omega ^{2}{\bf {M}}}})=0$

This gives a solution for the first N eigenfrequencies and eigenmodes of the system, with N being the number of approximating functions.

Simple case of double spring-mass system

The following discussion uses the simplest case, where the system has two lumped springs and two lumped masses, and only two mode shapes are assumed. Hence M = [m₁, m₂] and K = [k₁, k₂].

A mode shape is assumed for the system, with two terms, one of which is weighted by a factor B, e.g. Y = [1, 1] + B[1, −1]. Simple harmonic motion theory says that the velocity at the time when deflection is zero, is the angular frequency $\omega$ times the deflection (y) at time of maximum deflection. In this example the kinetic energy (KE) for each mass is ${\frac {1}{2}}\omega ^{2}Y_{1}^{2}m_{1}$ etc., and the potential energy (PE) for each spring is ${\frac {1}{2}}k_{1}Y_{1}^{2}$ etc.

We also know that without damping, the maximal KE equals the maximal PE. Thus,

\sum _{i=1}^{2}\left({\frac {1}{2}}\omega ^{2}Y_{i}^{2}M_{i}\right)=\sum _{i=1}^{2}\left({\frac {1}{2}}K_{i}Y_{i}^{2}\right)

Note that the overall amplitude of the mode shape cancels out from each side, always. That is, the actual size of the assumed deflection does not matter, just the mode shape.

Mathematical manipulations then obtain an expression for $\omega$ , in terms of B, which can be differentiated with respect to B, to find the minimum, i.e. when $d\omega /dB=0$ . This gives the value of B for which $\omega$ is lowest. This is an upper bound solution for $\omega$ if $\omega$ is hoped to be the predicted fundamental frequency of the system because the mode shape is assumed, but we have found the lowest value of that upper bound, given our assumptions, because B is used to find the optimal 'mix' of the two assumed mode shape functions.

There are many tricks with this method, the most important is to try and choose realistic assumed mode shapes. For example, in the case of beam deflection problems it is wise to use a deformed shape that is analytically similar to the expected solution. A quartic may fit most of the easy problems of simply linked beams even if the order of the deformed solution may be lower. The springs and masses do not have to be discrete, they can be continuous (or a mixture), and this method can be easily used in a spreadsheet to find the natural frequencies of quite complex distributed systems, if you can describe the distributed KE and PE terms easily, or else break the continuous elements up into discrete parts.

This method could be used iteratively, adding additional mode shapes to the previous best solution, or you can build up a long expression with many Bs and many mode shapes, and then differentiate them partially.

Notes and references

^ ^a ^b Leissa, A.W. (2005). "The historical bases of the Rayleigh and Ritz methods". Journal of Sound and Vibration. 287 (4–5): 961–978. Bibcode:2005JSV...287..961L. doi:10.1016/j.jsv.2004.12.021.
^ ^a ^b Ilanko, Sinniah (2009). "Comments on the historical bases of the Rayleigh and Ritz methods". Journal of Sound and Vibration. 319 (1–2): 731–733. doi:10.1016/j.jsv.2008.06.001.
^ Trefethen, Lloyd N.; Bau, III, David (1997). Numerical Linear Algebra. SIAM. p. 254. ISBN 978-0-89871-957-4.

External links

Course on Calculus of Variations, has a section on Rayleigh–Ritz method.

[Leissa-1] Leissa, A.W. (2005). "The historical bases of the Rayleigh and Ritz methods". Journal of Sound and Vibration. 287 (4–5): 961–978. Bibcode:2005JSV...287..961L. doi:10.1016/j.jsv.2004.12.021.

[Ilanko-2] Ilanko, Sinniah (2009). "Comments on the historical bases of the Rayleigh and Ritz methods". Journal of Sound and Vibration. 319 (1–2): 731–733. doi:10.1016/j.jsv.2008.06.001.

[TrefethenIII1997-3] Trefethen, Lloyd N.; Bau, III, David (1997). Numerical Linear Algebra. SIAM. p. 254. ISBN 978-0-89871-957-4.

[1]

[2]

[3]

For matrix eigenvalue problems

Example

For matrix singular value problems

Using the normal matrix

Example

Derivation from calculus of variations

Applications in mechanical engineering

Simple case of double spring-mass system

See also

Notes and references

External links