In numerical linear algebra , the conjugate gradient method is an iterative method for numerically solving the linear system
A
x
=
b
{\displaystyle {\boldsymbol {Ax}}={\boldsymbol {b}}}
where
A
{\displaystyle {\boldsymbol {A}}}
is symmetric positive-definite . The conjugate gradient method can be derived from several different perspectives, including specialization of the conjugate direction method [ 1] for optimization , and variation of the Arnoldi /Lanczos iteration for eigenvalue problems.
The intent of this article is to document the important steps in these derivations.
Derivation from the conjugate direction method
This section
needs expansion . You can help by
adding to it .
(April 2010 )
The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function
f
(
x
)
=
x
T
A
x
−
2
b
T
x
.
{\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}}
The conjugate direction method
In the conjugate direction method for minimizing
f
(
x
)
=
x
T
A
x
−
2
b
T
x
.
{\displaystyle f({\boldsymbol {x}})={\boldsymbol {x}}^{\mathrm {T} }{\boldsymbol {A}}{\boldsymbol {x}}-2{\boldsymbol {b}}^{\mathrm {T} }{\boldsymbol {x}}{\text{.}}}
one starts with an initial guess
x
0
{\displaystyle {\boldsymbol {x}}_{0}}
and the corresponding residual
r
0
=
b
−
A
x
0
{\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}}
, and computes the iterate and residual by the formulae
α
i
=
p
i
T
r
i
p
i
T
A
p
i
,
x
i
+
1
=
x
i
+
α
i
p
i
,
r
i
+
1
=
r
i
−
α
i
A
p
i
{\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{,}}\\{\boldsymbol {x}}_{i+1}&={\boldsymbol {x}}_{i}+\alpha _{i}{\boldsymbol {p}}_{i}{\text{,}}\\{\boldsymbol {r}}_{i+1}&={\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i}\end{aligned}}}
where
p
0
,
p
1
,
p
2
,
…
{\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots }
are a series of mutually conjugate directions, i.e.,
p
i
T
A
p
j
=
0
{\displaystyle {\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}=0}
for any
i
≠
j
{\displaystyle i\neq j}
.
The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions
p
0
,
p
1
,
p
2
,
…
{\displaystyle {\boldsymbol {p}}_{0},{\boldsymbol {p}}_{1},{\boldsymbol {p}}_{2},\ldots }
. Specific choices lead to various methods including the conjugate gradient method and Gaussian elimination .
Derivation from the Arnoldi/Lanczos iteration
The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.
The general Arnoldi method
In the Arnoldi iteration, one starts with a vector
r
0
{\displaystyle {\boldsymbol {r}}_{0}}
and gradually builds an orthonormal basis
{
v
1
,
v
2
,
v
3
,
…
}
{\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},{\boldsymbol {v}}_{3},\ldots \}}
of the Krylov subspace
K
(
A
,
r
0
)
=
s
p
a
n
{
r
0
,
A
r
0
,
A
2
r
0
,
…
}
{\displaystyle {\mathcal {K}}({\boldsymbol {A}},{\boldsymbol {r}}_{0})=\mathrm {span} \{{\boldsymbol {r}}_{0},{\boldsymbol {Ar}}_{0},{\boldsymbol {A}}^{2}{\boldsymbol {r}}_{0},\ldots \}}
by defining
v
i
=
w
i
/
‖
w
i
‖
2
{\displaystyle {\boldsymbol {v}}_{i}={\boldsymbol {w}}_{i}/\lVert {\boldsymbol {w}}_{i}\rVert _{2}}
where
v
i
=
{
r
0
if
i
=
1
,
A
v
i
−
1
−
∑
j
=
1
i
−
1
(
v
j
T
A
v
i
−
1
)
v
j
if
i
>
1
.
{\displaystyle {\boldsymbol {v}}_{i}={\begin{cases}{\boldsymbol {r}}_{0}&{\text{if }}i=1{\text{,}}\\{\boldsymbol {Av}}_{i-1}-\sum _{j=1}^{i-1}({\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i-1}){\boldsymbol {v}}_{j}&{\text{if }}i>1{\text{.}}\end{cases}}}
In other words, for
i
>
1
{\displaystyle i>1}
,
v
i
{\displaystyle {\boldsymbol {v}}_{i}}
is found by Gram-Schmidt orthogonalizing
A
v
i
−
1
{\displaystyle {\boldsymbol {Av}}_{i-1}}
against
{
v
1
,
v
2
,
…
,
v
i
−
1
}
{\displaystyle \{{\boldsymbol {v}}_{1},{\boldsymbol {v}}_{2},\ldots ,{\boldsymbol {v}}_{i-1}\}}
followed by normalization.
Put in matrix form, the iteration is captured by the equation
A
V
i
=
V
i
+
1
H
~
i
{\displaystyle {\boldsymbol {AV}}_{i}={\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}}
where
V
i
=
[
v
1
v
2
⋯
v
i
]
,
H
~
i
=
[
h
11
h
12
h
13
⋯
h
1
,
i
h
21
h
22
h
23
⋯
h
2
,
i
h
32
h
33
⋯
h
3
,
i
⋱
⋱
⋮
h
i
,
i
−
1
h
i
,
i
h
i
+
1
,
i
]
=
[
H
i
h
i
+
1
,
i
e
i
T
]
{\displaystyle {\begin{aligned}{\boldsymbol {V}}_{i}&={\begin{bmatrix}{\boldsymbol {v}}_{1}&{\boldsymbol {v}}_{2}&\cdots &{\boldsymbol {v}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {\tilde {H}}}_{i}&={\begin{bmatrix}h_{11}&h_{12}&h_{13}&\cdots &h_{1,i}\\h_{21}&h_{22}&h_{23}&\cdots &h_{2,i}\\&h_{32}&h_{33}&\cdots &h_{3,i}\\&&\ddots &\ddots &\vdots \\&&&h_{i,i-1}&h_{i,i}\\&&&&h_{i+1,i}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {H}}_{i}\\h_{i+1,i}{\boldsymbol {e}}_{i}^{\mathrm {T} }\end{bmatrix}}\end{aligned}}}
with
h
j
i
=
{
v
j
T
A
v
i
if
j
≤
i
,
‖
w
i
+
1
‖
2
if
j
=
i
+
1
,
0
if
j
>
i
+
1
.
{\displaystyle h_{ji}={\begin{cases}{\boldsymbol {v}}_{j}^{\mathrm {T} }{\boldsymbol {Av}}_{i}&{\text{if }}j\leq i{\text{,}}\\\lVert {\boldsymbol {w}}_{i+1}\rVert _{2}&{\text{if }}j=i+1{\text{,}}\\0&{\text{if }}j>i+1{\text{.}}\end{cases}}}
When applying the Arnoldi iteration to solving linear systems, one starts with
r
0
=
b
−
A
x
0
{\displaystyle {\boldsymbol {r}}_{0}={\boldsymbol {b}}-{\boldsymbol {Ax}}_{0}}
, the residual corresponding to an initial guess
x
0
{\displaystyle {\boldsymbol {x}}_{0}}
. After each step of iteration, one computes
y
i
=
H
i
−
1
(
‖
r
0
‖
2
e
1
)
{\displaystyle {\boldsymbol {y}}_{i}={\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})}
and the new iterate
x
i
=
x
0
+
V
i
y
i
{\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}}
.
The direct Lanczos method
For the rest of discussion, we assume that
A
{\displaystyle {\boldsymbol {A}}}
is symmetric positive-definite. With symmetry of
A
{\displaystyle {\boldsymbol {A}}}
, the upper Hessenberg matrix
H
i
=
V
i
T
A
V
i
{\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}}
becomes symmetric and thus tridiagonal. It then can be more clearly denoted by
H
i
=
[
a
1
b
2
b
2
a
2
b
3
⋱
⋱
⋱
b
i
−
1
a
i
−
1
b
i
b
i
a
i
]
.
{\displaystyle {\boldsymbol {H}}_{i}={\begin{bmatrix}a_{1}&b_{2}\\b_{2}&a_{2}&b_{3}\\&\ddots &\ddots &\ddots \\&&b_{i-1}&a_{i-1}&b_{i}\\&&&b_{i}&a_{i}\end{bmatrix}}{\text{.}}}
This enables a short three-term recurrence for
v
i
{\displaystyle {\boldsymbol {v}}_{i}}
in the iteration, and the Arnoldi iteration is reduced to the Lanczos iteration.
Since
A
{\displaystyle {\boldsymbol {A}}}
is symmetric positive-definite, so is
H
i
{\displaystyle {\boldsymbol {H}}_{i}}
. Hence,
H
i
{\displaystyle {\boldsymbol {H}}_{i}}
can be LU factorized without partial pivoting into
H
i
=
L
i
U
i
=
[
1
c
2
1
⋱
⋱
c
i
−
1
1
c
i
1
]
[
d
1
b
2
d
2
b
3
⋱
⋱
d
i
−
1
b
i
d
i
]
{\displaystyle {\boldsymbol {H}}_{i}={\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}={\begin{bmatrix}1\\c_{2}&1\\&\ddots &\ddots \\&&c_{i-1}&1\\&&&c_{i}&1\end{bmatrix}}{\begin{bmatrix}d_{1}&b_{2}\\&d_{2}&b_{3}\\&&\ddots &\ddots \\&&&d_{i-1}&b_{i}\\&&&&d_{i}\end{bmatrix}}}
with convenient recurrences for
c
i
{\displaystyle c_{i}}
and
d
i
{\displaystyle d_{i}}
:
c
i
=
b
i
/
d
i
−
1
,
d
i
=
{
a
1
if
i
=
1
,
a
i
−
c
i
b
i
if
i
>
1
.
{\displaystyle {\begin{aligned}c_{i}&=b_{i}/d_{i-1}{\text{,}}\\d_{i}&={\begin{cases}a_{1}&{\text{if }}i=1{\text{,}}\\a_{i}-c_{i}b_{i}&{\text{if }}i>1{\text{.}}\end{cases}}\end{aligned}}}
Rewrite
x
i
=
x
0
+
V
i
y
i
{\displaystyle {\boldsymbol {x}}_{i}={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i}}
as
x
i
=
x
0
+
V
i
H
i
−
1
(
‖
r
0
‖
2
e
1
)
=
x
0
+
V
i
U
i
−
1
L
i
−
1
(
‖
r
0
‖
2
e
1
)
=
x
0
+
P
i
z
i
{\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}\end{aligned}}}
with
P
i
=
V
i
U
i
−
1
,
z
i
=
L
i
−
1
(
‖
r
0
‖
2
e
1
)
.
{\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\boldsymbol {V}}_{i}{\boldsymbol {U}}_{i}^{-1}{\text{,}}\\{\boldsymbol {z}}_{i}&={\boldsymbol {L}}_{i}^{-1}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1}){\text{.}}\end{aligned}}}
It is now important to observe that
P
i
=
[
P
i
−
1
p
i
]
,
z
i
=
[
z
i
−
1
ζ
i
]
.
{\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}&={\begin{bmatrix}{\boldsymbol {P}}_{i-1}&{\boldsymbol {p}}_{i}\end{bmatrix}}{\text{,}}\\{\boldsymbol {z}}_{i}&={\begin{bmatrix}{\boldsymbol {z}}_{i-1}\\\zeta _{i}\end{bmatrix}}{\text{.}}\end{aligned}}}
In fact, there are short recurrences for
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
and
ζ
i
{\displaystyle \zeta _{i}}
as well:
p
i
=
1
d
i
(
v
i
−
b
i
p
i
−
1
)
,
ζ
i
=
−
c
i
ζ
i
−
1
.
{\displaystyle {\begin{aligned}{\boldsymbol {p}}_{i}&={\frac {1}{d_{i}}}({\boldsymbol {v}}_{i}-b_{i}{\boldsymbol {p}}_{i-1}){\text{,}}\\\zeta _{i}&=-c_{i}\zeta _{i-1}{\text{.}}\end{aligned}}}
With this formulation, we arrive at a simple recurrence for
x
i
{\displaystyle {\boldsymbol {x}}_{i}}
:
x
i
=
x
0
+
P
i
z
i
=
x
0
+
P
i
−
1
z
i
−
1
+
ζ
i
p
i
=
x
i
−
1
+
ζ
i
p
i
.
{\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i}{\boldsymbol {z}}_{i}\\&={\boldsymbol {x}}_{0}+{\boldsymbol {P}}_{i-1}{\boldsymbol {z}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}\\&={\boldsymbol {x}}_{i-1}+\zeta _{i}{\boldsymbol {p}}_{i}{\text{.}}\end{aligned}}}
The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.
The conjugate gradient method from imposing orthogonality and conjugacy
If we allow
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
to scale and compensate for the scaling in the constant factor, we potentially can have simpler recurrences of the form:
x
i
=
x
i
−
1
+
α
i
−
1
p
i
−
1
,
r
i
=
r
i
−
1
−
α
i
−
1
A
p
i
−
1
,
p
i
=
r
i
+
β
i
−
1
p
i
−
1
.
{\displaystyle {\begin{aligned}{\boldsymbol {x}}_{i}&={\boldsymbol {x}}_{i-1}+\alpha _{i-1}{\boldsymbol {p}}_{i-1}{\text{,}}\\{\boldsymbol {r}}_{i}&={\boldsymbol {r}}_{i-1}-\alpha _{i-1}{\boldsymbol {Ap}}_{i-1}{\text{,}}\\{\boldsymbol {p}}_{i}&={\boldsymbol {r}}_{i}+\beta _{i-1}{\boldsymbol {p}}_{i-1}{\text{.}}\end{aligned}}}
As premises for the simplification, we now derive the orthogonality of
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
and conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
, i.e., for
i
≠
j
{\displaystyle i\neq j}
,
r
i
T
r
j
=
0
,
p
i
T
A
p
j
=
0
.
{\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{j}&=0{\text{,}}\\{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{j}&=0{\text{.}}\end{aligned}}}
The residuals are mutually orthogonal because
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
is essentially a multiple of
v
i
+
1
{\displaystyle {\boldsymbol {v}}_{i+1}}
since for
i
=
0
{\displaystyle i=0}
,
r
0
=
‖
r
0
‖
2
v
1
{\displaystyle {\boldsymbol {r}}_{0}=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}}
, for
i
>
0
{\displaystyle i>0}
,
r
i
=
b
−
A
x
i
=
b
−
A
(
x
0
+
V
i
y
i
)
=
r
0
−
A
V
i
y
i
=
r
0
−
V
i
+
1
H
~
i
y
i
=
r
0
−
V
i
H
i
y
i
−
h
i
+
1
,
i
(
e
i
T
y
i
)
v
i
+
1
=
‖
r
0
‖
2
v
1
−
V
i
(
‖
r
0
‖
2
e
1
)
−
h
i
+
1
,
i
(
e
i
T
y
i
)
v
i
+
1
=
−
h
i
+
1
,
i
(
e
i
T
y
i
)
v
i
+
1
.
{\displaystyle {\begin{aligned}{\boldsymbol {r}}_{i}&={\boldsymbol {b}}-{\boldsymbol {Ax}}_{i}\\&={\boldsymbol {b}}-{\boldsymbol {A}}({\boldsymbol {x}}_{0}+{\boldsymbol {V}}_{i}{\boldsymbol {y}}_{i})\\&={\boldsymbol {r}}_{0}-{\boldsymbol {AV}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i+1}{\boldsymbol {\tilde {H}}}_{i}{\boldsymbol {y}}_{i}\\&={\boldsymbol {r}}_{0}-{\boldsymbol {V}}_{i}{\boldsymbol {H}}_{i}{\boldsymbol {y}}_{i}-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {v}}_{1}-{\boldsymbol {V}}_{i}(\lVert {\boldsymbol {r}}_{0}\rVert _{2}{\boldsymbol {e}}_{1})-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}\\&=-h_{i+1,i}({\boldsymbol {e}}_{i}^{\mathrm {T} }{\boldsymbol {y}}_{i}){\boldsymbol {v}}_{i+1}{\text{.}}\end{aligned}}}
To see the conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
, it suffices to show that
P
i
T
A
P
i
{\displaystyle {\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}}
is diagonal:
P
i
T
A
P
i
=
U
i
−
T
V
i
T
A
V
i
U
i
−
1
=
U
i
−
T
H
i
U
i
−
1
=
U
i
−
T
L
i
U
i
U
i
−
1
=
U
i
−
T
L
i
{\displaystyle {\begin{aligned}{\boldsymbol {P}}_{i}^{\mathrm {T} }{\boldsymbol {AP}}_{i}&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {V}}_{i}^{\mathrm {T} }{\boldsymbol {AV}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {H}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}{\boldsymbol {U}}_{i}{\boldsymbol {U}}_{i}^{-1}\\&={\boldsymbol {U}}_{i}^{-\mathrm {T} }{\boldsymbol {L}}_{i}\end{aligned}}}
is symmetric and lower triangular simultaneously and thus must be diagonal.
Now we can derive the constant factors
α
i
{\displaystyle \alpha _{i}}
and
β
i
{\displaystyle \beta _{i}}
with respect to the scaled
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
by solely imposing the orthogonality of
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
and conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
.
Due to the orthogonality of
r
i
{\displaystyle {\boldsymbol {r}}_{i}}
, it is necessary that
r
i
+
1
T
r
i
=
(
r
i
−
α
i
A
p
i
)
T
r
i
=
0
{\displaystyle {\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i}=({\boldsymbol {r}}_{i}-\alpha _{i}{\boldsymbol {Ap}}_{i})^{\mathrm {T} }{\boldsymbol {r}}_{i}=0}
. As a result,
α
i
=
r
i
T
r
i
r
i
T
A
p
i
=
r
i
T
r
i
(
p
i
−
β
i
−
1
p
i
−
1
)
T
A
p
i
=
r
i
T
r
i
p
i
T
A
p
i
.
{\displaystyle {\begin{aligned}\alpha _{i}&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{({\boldsymbol {p}}_{i}-\beta _{i-1}{\boldsymbol {p}}_{i-1})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}{\text{.}}\end{aligned}}}
Similarly, due to the conjugacy of
p
i
{\displaystyle {\boldsymbol {p}}_{i}}
, it is necessary that
p
i
+
1
T
A
p
i
=
(
r
i
+
1
+
β
i
p
i
)
T
A
p
i
=
0
{\displaystyle {\boldsymbol {p}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=({\boldsymbol {r}}_{i+1}+\beta _{i}{\boldsymbol {p}}_{i})^{\mathrm {T} }{\boldsymbol {Ap}}_{i}=0}
. As a result,
β
i
=
−
r
i
+
1
T
A
p
i
p
i
T
A
p
i
=
−
r
i
+
1
T
(
r
i
−
r
i
+
1
)
α
i
p
i
T
A
p
i
=
r
i
+
1
T
r
i
+
1
r
i
T
r
i
.
{\displaystyle {\begin{aligned}\beta _{i}&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}{{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&=-{\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }({\boldsymbol {r}}_{i}-{\boldsymbol {r}}_{i+1})}{\alpha _{i}{\boldsymbol {p}}_{i}^{\mathrm {T} }{\boldsymbol {Ap}}_{i}}}\\&={\frac {{\boldsymbol {r}}_{i+1}^{\mathrm {T} }{\boldsymbol {r}}_{i+1}}{{\boldsymbol {r}}_{i}^{\mathrm {T} }{\boldsymbol {r}}_{i}}}{\text{.}}\end{aligned}}}
This completes the derivation.
References
Hestenes, M. R. ; Stiefel, E. (December 1952). "Methods of conjugate gradients for solving linear systems" (PDF) . Journal of Research of the National Bureau of Standards . 49 (6): 409. doi :10.6028/jres.049.044 .
Saad, Y. (2003). "Chapter 6: Krylov Subspace Methods, Part I". Iterative methods for sparse linear systems (2nd ed.). SIAM. ISBN 978-0-89871-534-7 .
Key concepts Problems Hardware Software