Compressed sensing in speech signals

Compressed Sensing(CS) can be used to reconstruct sparse vector from less number of measurements, provided the signal can be represented in sparse domain. Sparse domain is a domain in which only a few measurements are non zero. Suppose a signal ${x\in R^{N}}$ can be represented in a domain where only ${\it {M}}$ coefficients out of ${\it {N}}$ (where ${M\ll N}$ are non zero, then the signal is said to be sparse in that domain. This reconstructed sparse vector can be used to construct back the original signal if the sparse domain of signal is known. CS can be applied to speech signal only if sparse domain of speech signal is known.

Consider a speech signal ${x}$ , which can be represented in a domain ${\Psi }$ such that ${x}={\Psi {\boldsymbol {\alpha }}}$ , where speech signal ${x\in R^{\it {N}}}$ , dictionary matrix ${\Psi \in R^{\it {N\times N}}}$ and the sparse coefficient vector ${{\boldsymbol {\alpha }}\in R^{\it {N}}}$ . This speech signal is said to be sparse in domain ${\Psi }$ , if number of significant (non zero) coefficients in sparse vector ${\boldsymbol {\alpha }}$ are ${\it {K}}$ , where ${\it {K\ll N}}$ .

The observed signal ${x}$ is of dimension ${\it {N\times 1}}$ . To reduce the complexity for solving ${\boldsymbol {\alpha }}$ using CS speech signal is observed using a measurement matrix ${\Phi }$ such that

{y=\Phi x}

1

where ${y\in R^{\it {M}}}$ , and measurement matrix ${\Phi \in R^{\it {M\times N}}}$ such that ${\it {M\ll N}}$ .

Sparse decomposition problem for eq.1 can be solved as standard ${l_{1}}$ minimization given in ^[1] as

{{\boldsymbol {\hat {\mathbf {\boldsymbol {\alpha }} }}}={\mbox{minimize}}\;\Vert \mathbf {\boldsymbol {\alpha }} \Vert _{1}\;\;\;\;{\mbox{s.t.}}\;\;\;\;\mathbf {y} =\mathbf {\Phi x} =\mathbf {\Phi \Psi } \mathbf {\boldsymbol {\alpha }} =\mathbf {A{\boldsymbol {\alpha }}} ,\;{\mbox{where}}\;\;\mathbf {A} =\mathbf {\Phi \Psi } }

2

If measurement matrix ${\Phi }$ satisfies the restricted isometric property (RIP) and is incoherent with dictionary matrix ${\Psi }$ .^[2] then the reconstructed signal is much closer to the original speech signal.

Different types of measurement matrices like random matrices can be used for speech signals.^[3]^[4] Estimating the sparsity of speech signal is a problem since speech signal highly varies over time and thus sparsity of speech signal also varies highly over time. If sparsity of speech signal can be calculated over time without much complexity that will be best. If this is not possible then worst case scenario for sparsity can be considered for a given speech signal.

Sparse vector ( ${\hat {\boldsymbol {\alpha }}}$ ) for a given speech signals is reconstructed from less number of measurements ( ${y}$ ) using ${l_{1}}$ minimization.^[1] Then original speech signal is reconstructed form the calculated sparse vector ${\hat {\boldsymbol {\alpha }}}$ using the fixed dictionary matrix as ${\Psi }$ as ${\hat {x}}$ = ${\Psi }$ ${\hat {\boldsymbol {\alpha }}}$ .^[5]

Estimation of both the dictionary matrix and sparse vector from just random measurements only has been done iteratively in.^[6] The speech signal reconstructed from estimated sparse vector and dictionary matrix is much closer to the original signal. Some more iterative approaches to calculate both dictionary matrix and speech signal from just random measurements of speech signal are shown in.^[7] The concept of sparsity in speech signal is yet to be utilized for some applications in field of speech processing. The idea behind CS for speech signals is that can we come up with some algorithms or methods where we only use those random measurements ( ${y})$ to do some application based processing like speaker recognition, speech enhancement etc.

References

^ ^a ^b Donoho D. Compressed sensing, IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006
^ Candes E. and Romberg J. and Tao T., Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, vol.52, no.2, pp. 489-509, 2006
^ Zhang G. and Jiao S. and Xu X. and Wang L., Compressed sensing and reconstruction with bernoulli matrices, IEEE International Conference on Information and Automation (ICIA), pp. 455-460, 2010
^ Li K. and Ling C. and Gan L., Deterministic compressed-sensing matrices: Where toeplitz meets golay, IEEE International Conference on Acoustics Speech and Signal Processing(ICASSP), pp. 3748-3751, 2011
^ Christensen M. and Stergaard J. and Jensen S., On compressed sensing and its application to speech and audio signals, Forty-Third Asilomar Conference on Signals, Systems and Computers, pp. 356 - 360, 2009
^ Raj C. S. and Sreenivas T. V., Time-varying signal adaptive transform and IHT recovery of compressive sensed speech, INTERSPEECH, pp 73 - 76, 2011
^ Chetupally S.R. and Sreenivas T.V., Joint pitch-analysis formant-synthesis framework for cs recovery of speech, INTERSPEECH, 2012

Ankit Kundu , Pradosh K. Roy , Sparse Signal Recovery from Nonadaptive Linear Measurements . <ref></http://arxiv.org/abs/1310.8468>

[Donoho-1] Donoho D. Compressed sensing, IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006

[2] Candes E. and Romberg J. and Tao T., Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, vol.52, no.2, pp. 489-509, 2006

[3] Zhang G. and Jiao S. and Xu X. and Wang L., Compressed sensing and reconstruction with bernoulli matrices, IEEE International Conference on Information and Automation (ICIA), pp. 455-460, 2010

[4] Li K. and Ling C. and Gan L., Deterministic compressed-sensing matrices: Where toeplitz meets golay, IEEE International Conference on Acoustics Speech and Signal Processing(ICASSP), pp. 3748-3751, 2011

[5] Christensen M. and Stergaard J. and Jensen S., On compressed sensing and its application to speech and audio signals, Forty-Third Asilomar Conference on Signals, Systems and Computers, pp. 356 - 360, 2009

[6] Raj C. S. and Sreenivas T. V., Time-varying signal adaptive transform and IHT recovery of compressive sensed speech, INTERSPEECH, pp 73 - 76, 2011

[7] Chetupally S.R. and Sreenivas T.V., Joint pitch-analysis formant-synthesis framework for cs recovery of speech, INTERSPEECH, 2012

[1]

[2]

[3]

[4]

[5]

[6]

[7]