In mathematics, block matrix pseudoinverse is a formula of pseudoinverse of a partitioned matrix . This is useful for decomposing or approximating many algorithms updating parameters in signal processing , which are based on least squares method.
Derivation
Consider a column-wise partitioned matrix:
[
A
,
B
]
,
A
∈
R
n
×
m
,
B
∈
R
p
×
m
,
m
≥
n
+
p
.
{\displaystyle [\mathbf {A} ,\mathbf {B} ],\qquad \mathbf {A} \in \mathbb {R} ^{n\times m},\qquad \mathbf {B} \in \mathbb {R} ^{p\times m},\qquad m\geq n+p.}
If the above matrix is full rank, the pseudoinverse matrices of it and its transpose are as follows.
[
A
,
B
]
+
=
(
[
A
,
B
]
T
[
A
,
B
]
)
−
1
[
A
,
B
]
T
,
{\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+}=([\mathbf {A} ,\mathbf {B} ]^{T}[\mathbf {A} ,\mathbf {B} ])^{-1}[\mathbf {A} ,\mathbf {B} ]^{T},}
[
A
T
B
T
]
+
=
[
A
,
B
]
(
[
A
,
B
]
T
[
A
,
B
]
)
−
1
.
{\displaystyle {\begin{bmatrix}\mathbf {A} ^{T}\\\mathbf {B} ^{T}\end{bmatrix}}^{+}=[\mathbf {A} ,\mathbf {B} ]([\mathbf {A} ,\mathbf {B} ]^{T}[\mathbf {A} ,\mathbf {B} ])^{-1}.}
The pseudoinverse requires (n + p )-square matrix inversion.
To reduce complexity and introduce parallelism, we derive the following decomposed formula. From a block matrix inverse
(
[
A
,
B
]
T
[
A
,
B
]
)
−
1
{\displaystyle \mathbf {(} [\mathbf {A} ,\mathbf {B} ]^{T}[\mathbf {A} ,\mathbf {B} ])^{-1}}
, we can have[citation needed ]
[
A
,
B
]
+
=
[
P
B
⊥
A
(
A
T
P
B
⊥
A
)
−
1
,
P
A
⊥
B
(
B
T
P
A
⊥
B
)
−
1
]
T
,
{\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+}=\left[\mathbf {P} _{B}^{\perp }\mathbf {A} (\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} (\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}\right]^{T},}
[
A
T
B
T
]
+
=
[
P
B
⊥
A
(
A
T
P
B
⊥
A
)
−
1
,
P
A
⊥
B
(
B
T
P
A
⊥
B
)
−
1
]
,
{\displaystyle {\begin{bmatrix}\mathbf {A} ^{T}\\\mathbf {B} ^{T}\end{bmatrix}}^{+}=\left[\mathbf {P} _{B}^{\perp }\mathbf {A} (\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} (\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}\right],}
where orthogonal projection matrices are defined by
P
A
⊥
=
I
−
A
(
A
T
A
)
−
1
A
T
,
P
B
⊥
=
I
−
B
(
B
T
B
)
−
1
B
T
.
{\displaystyle {\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} (\mathbf {A} ^{T}\mathbf {A} )^{-1}\mathbf {A} ^{T},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} (\mathbf {B} ^{T}\mathbf {B} )^{-1}\mathbf {B} ^{T}.\end{aligned}}}
Interestingly, from the idempotence of projection matrix, we can verify that the pseudoinverse of block matrix consists of pseudoinverse of projected matrices:
[
A
,
B
]
+
=
[
(
P
B
⊥
A
)
+
(
P
A
⊥
B
)
+
]
,
{\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+}={\begin{bmatrix}(\mathbf {P} _{B}^{\perp }\mathbf {A} )^{+}\\(\mathbf {P} _{A}^{\perp }\mathbf {B} )^{+}\end{bmatrix}},}
[
A
T
B
T
]
+
=
[
(
A
T
P
B
⊥
)
+
,
(
B
T
P
A
⊥
)
+
]
.
{\displaystyle {\begin{bmatrix}\mathbf {A} ^{T}\\\mathbf {B} ^{T}\end{bmatrix}}^{+}=[(\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp })^{+},\quad (\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp })^{+}].}
Thus, we decomposed the block matrix pseudoinverse into two submatrix pseudoinverses, which cost n - and p -square matrix inversions, respectively.
Note that the above formulae are not necessarily valid if
[
A
,
B
]
{\displaystyle [\mathbf {A} ,\mathbf {B} ]}
does not have full rank – for example, if
A
≠
0
{\displaystyle \mathbf {A} \neq 0}
, then
[
A
,
A
]
+
=
1
2
[
A
+
A
+
]
≠
[
(
P
A
⊥
A
)
+
(
P
A
⊥
A
)
+
]
=
0
{\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}(\mathbf {P} _{A}^{\perp }\mathbf {A} )^{+}\\(\mathbf {P} _{A}^{\perp }\mathbf {A} )^{+}\end{bmatrix}}=0}
Application to least squares problems
Given the same matrices as above, we consider the following least squares problems, which
appear as multiple objective optimizations or constrained problems in signal processing.
Eventually, we can implement a parallel algorithm for least squares based on the following results.
Column-wise partitioning in over-determined least squares
Suppose a solution
x
=
[
x
1
x
2
]
{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}}
solves an over-determined system:
[
A
,
B
]
[
x
1
x
2
]
=
d
,
d
∈
R
m
×
1
.
{\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\qquad \mathbf {d} \in \mathbb {R} ^{m\times 1}.}
Using the block matrix pseudoinverse, we have
x
=
[
A
,
B
]
+
d
=
[
(
P
B
⊥
A
)
+
(
P
A
⊥
B
)
+
]
d
.
{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+}\,\mathbf {d} ={\begin{bmatrix}(\mathbf {P} _{B}^{\perp }\mathbf {A} )^{+}\\(\mathbf {P} _{A}^{\perp }\mathbf {B} )^{+}\end{bmatrix}}\mathbf {d} .}
Therefore, we have a decomposed solution:
x
1
=
(
P
B
⊥
A
)
+
d
,
x
2
=
(
P
A
⊥
B
)
+
d
.
{\displaystyle \mathbf {x} _{1}=(\mathbf {P} _{B}^{\perp }\mathbf {A} )^{+}\,\mathbf {d} ,\qquad \mathbf {x} _{2}=(\mathbf {P} _{A}^{\perp }\mathbf {B} )^{+}\,\mathbf {d} .}
Row-wise partitioning in under-determined least squares
Suppose a solution
x
{\displaystyle \mathbf {x} }
solves an under-determined system:
[
A
T
B
T
]
x
=
[
e
f
]
,
e
∈
R
n
×
1
,
f
∈
R
p
×
1
.
{\displaystyle {\begin{bmatrix}\mathbf {A} ^{T}\\\mathbf {B} ^{T}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\qquad \mathbf {e} \in \mathbb {R} ^{n\times 1},\qquad \mathbf {f} \in \mathbb {R} ^{p\times 1}.}
The minimum-norm solution is given by
x
=
[
A
T
B
T
]
+
[
e
f
]
.
{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{T}\\\mathbf {B} ^{T}\end{bmatrix}}^{+}\,{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.}
Using the block matrix pseudoinverse, we have
x
=
[
(
A
T
P
B
⊥
)
+
,
(
B
T
P
A
⊥
)
+
]
[
e
f
]
=
(
A
T
P
B
⊥
)
+
e
+
(
B
T
P
A
⊥
)
+
f
.
{\displaystyle \mathbf {x} =[(\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp })^{+},\quad (\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp })^{+}]{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=(\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp })^{+}\,\mathbf {e} +(\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp })^{+}\,\mathbf {f} .}
Instead of
(
[
A
,
B
]
T
[
A
,
B
]
)
−
1
{\displaystyle \mathbf {(} [\mathbf {A} ,\mathbf {B} ]^{T}[\mathbf {A} ,\mathbf {B} ])^{-1}}
,
we need to calculate directly or indirectly[citation needed ] [original research? ]
(
A
T
A
)
−
1
,
(
B
T
B
)
−
1
,
(
A
T
P
B
⊥
A
)
−
1
,
(
B
T
P
A
⊥
B
)
−
1
.
{\displaystyle \quad (\mathbf {A} ^{T}\mathbf {A} )^{-1},\quad (\mathbf {B} ^{T}\mathbf {B} )^{-1},\quad (\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1},\quad (\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}.}
In a dense and small system, we can use singular value decomposition , QR decomposition , or Cholesky decomposition to replace the matrix inversions with numerical routines. In a large system, we may employ iterative methods such as Krylov subspace methods.
Considering parallel algorithms , we can compute
(
A
T
A
)
−
1
{\displaystyle (\mathbf {A} ^{T}\mathbf {A} )^{-1}}
and
(
B
T
B
)
−
1
{\displaystyle (\mathbf {B} ^{T}\mathbf {B} )^{-1}}
in parallel. Then, we finish to compute
(
A
T
P
B
⊥
A
)
−
1
{\displaystyle (\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1}}
and
(
B
T
P
A
⊥
B
)
−
1
{\displaystyle (\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}}
also in parallel.
Block matrix inversion
Let a block matrix be
[
A
B
C
D
]
.
{\displaystyle {\begin{bmatrix}A&B\\C&D\end{bmatrix}}.}
We can get an inverse formula by combining the previous results in.[ 1]
[
A
B
C
D
]
−
1
=
[
(
A
−
B
D
−
1
C
)
−
1
−
A
−
1
B
(
D
−
C
A
−
1
B
)
−
1
−
D
−
1
C
(
A
−
B
D
−
1
C
)
−
1
(
D
−
C
A
−
1
B
)
−
1
]
=
[
S
D
−
1
−
A
−
1
B
S
A
−
1
−
D
−
1
C
S
D
−
1
S
A
−
1
]
,
{\displaystyle {\begin{bmatrix}A&B\\C&D\end{bmatrix}}^{-1}={\begin{bmatrix}(A-BD^{-1}C)^{-1}&-A^{-1}B(D-CA^{-1}B)^{-1}\\-D^{-1}C(A-BD^{-1}C)^{-1}&(D-CA^{-1}B)^{-1}\end{bmatrix}}={\begin{bmatrix}S_{D}^{-1}&-A^{-1}BS_{A}^{-1}\\-D^{-1}CS_{D}^{-1}&S_{A}^{-1}\end{bmatrix}},}
where
S
A
{\displaystyle S_{A}}
and
S
D
{\displaystyle S_{D}}
, respectively, Schur complements of
A
{\displaystyle A}
and
D
{\displaystyle D}
, are defined by
S
A
=
D
−
C
A
−
1
B
{\displaystyle S_{A}=D-CA^{-1}B}
, and
S
D
=
A
−
B
D
−
1
C
{\displaystyle S_{D}=A-BD^{-1}C}
. This relation is derived by using Block Triangular
Decomposition. It is called simple block matrix inversion. [ 2]
Now we can obtain the inverse of the symmetric block matrix:
[
A
T
A
A
T
B
B
T
A
B
T
B
]
−
1
=
[
(
A
T
A
−
A
T
B
(
B
T
B
)
−
1
B
T
A
)
−
1
−
(
A
T
A
)
−
1
A
T
B
(
B
T
B
−
B
T
A
(
A
T
A
)
−
1
A
T
B
)
−
1
−
(
B
T
B
)
−
1
B
T
A
(
A
T
A
−
A
T
B
(
B
T
B
)
−
1
B
T
A
)
−
1
(
B
T
B
−
B
T
A
(
A
T
A
)
−
1
A
T
B
)
−
1
]
{\displaystyle {\begin{bmatrix}\mathbf {A} ^{T}\mathbf {A} &\mathbf {A} ^{T}\mathbf {B} \\\mathbf {B} ^{T}\mathbf {A} &\mathbf {B} ^{T}\mathbf {B} \end{bmatrix}}^{-1}={\begin{bmatrix}(\mathbf {A} ^{T}\mathbf {A} -\mathbf {A} ^{T}\mathbf {B} (\mathbf {B} ^{T}\mathbf {B} )^{-1}\mathbf {B} ^{T}\mathbf {A} )^{-1}&-(\mathbf {A} ^{T}\mathbf {A} )^{-1}\mathbf {A} ^{T}\mathbf {B} (\mathbf {B} ^{T}\mathbf {B} -\mathbf {B} ^{T}\mathbf {A} (\mathbf {A} ^{T}\mathbf {A} )^{-1}\mathbf {A} ^{T}\mathbf {B} )^{-1}\\-(\mathbf {B} ^{T}\mathbf {B} )^{-1}\mathbf {B} ^{T}\mathbf {A} (\mathbf {A} ^{T}\mathbf {A} -\mathbf {A} ^{T}\mathbf {B} (\mathbf {B} ^{T}\mathbf {B} )^{-1}\mathbf {B} ^{T}\mathbf {A} )^{-1}&(\mathbf {B} ^{T}\mathbf {B} -\mathbf {B} ^{T}\mathbf {A} (\mathbf {A} ^{T}\mathbf {A} )^{-1}\mathbf {A} ^{T}\mathbf {B} )^{-1}\end{bmatrix}}}
=
[
(
A
T
P
B
⊥
A
)
−
1
−
(
A
T
A
)
−
1
A
T
B
(
B
T
P
A
⊥
B
)
−
1
−
(
B
T
B
)
−
1
B
T
A
(
A
T
P
B
⊥
A
)
−
1
(
B
T
P
A
⊥
B
)
−
1
]
{\displaystyle ={\begin{bmatrix}(\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1}&-(\mathbf {A} ^{T}\mathbf {A} )^{-1}\mathbf {A} ^{T}\mathbf {B} (\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}\\-(\mathbf {B} ^{T}\mathbf {B} )^{-1}\mathbf {B} ^{T}\mathbf {A} (\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1}&(\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}\end{bmatrix}}}
Since the block matrix is symmetric, we also have
[
A
T
A
A
T
B
B
T
A
B
T
B
]
−
1
=
[
(
A
T
P
B
⊥
A
)
−
1
−
(
A
T
P
B
⊥
A
)
−
1
A
T
B
(
B
T
B
)
−
1
−
(
B
T
P
A
⊥
B
)
−
1
B
T
A
(
A
T
A
)
−
1
(
B
T
P
A
⊥
B
)
−
1
]
.
{\displaystyle {\begin{bmatrix}\mathbf {A} ^{T}\mathbf {A} &\mathbf {A} ^{T}\mathbf {B} \\\mathbf {B} ^{T}\mathbf {A} &\mathbf {B} ^{T}\mathbf {B} \end{bmatrix}}^{-1}={\begin{bmatrix}(\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1}&-(\mathbf {A} ^{T}\mathbf {P} _{B}^{\perp }\mathbf {A} )^{-1}\mathbf {A} ^{T}\mathbf {B} (\mathbf {B} ^{T}\mathbf {B} )^{-1}\\-(\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}\mathbf {B} ^{T}\mathbf {A} (\mathbf {A} ^{T}\mathbf {A} )^{-1}&(\mathbf {B} ^{T}\mathbf {P} _{A}^{\perp }\mathbf {B} )^{-1}\end{bmatrix}}.}
Then, we can see how the Schur complements are connected to the projection matrices of the symmetric, partitioned matrix.
See also
References
External links
Key concepts Problems Hardware Software