Harmonic Vector Excitation Coding

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm used in MPEG-4 Part 3 (MPEG-4 Audio) standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique.^[1] The total algorithmic delay for the encoder and decoder is 36 ms.^[2]

It was published as subpart 2 of ISO/IEC 14496-3:1999 (MPEG-4 Audio) in 1999.^[3] An extended version of HVXC was published in MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000).^[4]^[5]

MPEG-4 Natural Speech Coding Tool Set uses two algorithms: HVXC and CELP (Code Excited Linear Prediction). HVXC is used at a low bit rate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP.^[6]

Technology

Linear Predictive Coding

HVXC is a parametric speech codec, which in practice means that it is optimized for speech signals only. Linear predictive coding (LPC) is used to synthesize speech from a residual signal. The LPC parameters are transformed to Line spectral pair (LSP) coefficients, which are jointly quantized. The LPC residual signal is classified as either voiced or unvoiced.^[2]

Voiced (Harmonic) Residual Coding

In voiced segments, two parameters describing the residual signal are transmitted: the pitch period and the spectral envelope.^[2] The spectral envelope is represented by amplitude values, one value per harmonic (multiples of the pitch frequency).^[2] The pitch period is determined from the Autocorrelation function. To extract the spectral envelope parameter, the LPC residual signal is transformed into the DFT-domain.^[2] The DFT-spectrum is segmented into bands, one band per harmonic. The frequncy band for the m-th harmonic consists of the DFT-coefficients from (m-1/2)ω₀ to (m-1/2)ω₀, ω₀ being the pitch frequency. The amplitude value for the m-th harmonic is chosen to optimally represent these DFT-coefficients.^[2] The spectral envelope is then coded using variable-dimension weighted vector quantization. This process is also referred to as Harmonic VQ.

To make speech with a mixture of voiced and unvoiced excitation sound more natural and smooth, three different modes of voiced speech (Mixed Voiced-1, Mixed Voiced-2, Full Voiced) are differentiated.^[2] The degree of voicing is determined by the value of the normalized auto-correlation function at the pitch period. Depending on the chosen mode, different amounts of band-pass Gaussian noise are added to the synthesized harmonic signal by the decoder.

Voiceless (VXC) Residual Coding

Unvoiced segments are encoded according to the CELP scheme, which is also referred to as vector excitation coding (VXC).^[2] The CELP coding in HVXQ is performed using only a stochastic codebook. In other CELP codecs, a dynamic codebook is used additionally to perform long-term prediction of voiced segments. However, since HVXC does not use CELP for voiced segments, the dynamic codebook is omitted from the design.

References

^ ISO/IEC (2009-09-01), ISO/IEC 14496-3:2009 - Information technology -- Coding of audio-visual objects -- Part 3: Audio (PDF), IEC, retrieved 2009-10-07
^ ^a ^b ^c ^d ^e ^f ^g ^h Masayuki Nishiguchi (2006-04-17), Harmonic vector excitation coding of speech (PDF), Acoustical Science and Technology, retrieved 2009-10-09
^ ISO (1999). "ISO/IEC 14496-3:1999 - Information technology -- Coding of audio-visual objects -- Part 3: Audio". ISO. Retrieved 2009-10-09.
^ ISO (2000). "ISO/IEC 14496-3:1999/Amd 1:2000 - Audio extensions". ISO. Retrieved 2009-10-07.
^ ISO/IEC JTC 1/SC 29/WG 11 (1999-07), ISO/IEC 14496-3:/Amd.1 - Final Committee Draft - MPEG-4 Audio Version 2 (PDF), retrieved 2009-10-07 {{citation}}: Check date values in: |date= (help)CS1 maint: numeric names: authors list (link)
^ Karlheinz Brandenburg, Oliver Kunz, Akihiko Sugiyama. "MPEG-4 Natural Audio Coding - Natural Speech Coding Tools" (PDF). Retrieved 2013-03-25.{{cite web}}: CS1 maint: multiple names: authors list (link)

[mpeg4audio-version4-2009-1] ISO/IEC (2009-09-01), ISO/IEC 14496-3:2009 - Information technology -- Coding of audio-visual objects -- Part 3: Audio (PDF), IEC, retrieved 2009-10-07

[hvxc-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h Masayuki Nishiguchi (2006-04-17), Harmonic vector excitation coding of speech (PDF), Acoustical Science and Technology, retrieved 2009-10-09

[mpeg4audio-3] ISO (1999). "ISO/IEC 14496-3:1999 - Information technology -- Coding of audio-visual objects -- Part 3: Audio". ISO. Retrieved 2009-10-09.

[mpeg4audio-iso-2-amd-4] ISO (2000). "ISO/IEC 14496-3:1999/Amd 1:2000 - Audio extensions". ISO. Retrieved 2009-10-07.

[mpeg4audio-version2-5] ISO/IEC JTC 1/SC 29/WG 11 (1999-07), ISO/IEC 14496-3:/Amd.1 - Final Committee Draft - MPEG-4 Audio Version 2 (PDF), retrieved 2009-10-07 {{citation}}: Check date values in: |date= (help)CS1 maint: numeric names: authors list (link)

[speech-coding-chiariglione-6] Karlheinz Brandenburg, Oliver Kunz, Akihiko Sugiyama. "MPEG-4 Natural Audio Coding - Natural Speech Coding Tools" (PDF). Retrieved 2013-03-25.{{cite web}}: CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

[4]

[5]

[6]