Expt-8: Linear Prediction Analysis of Speech
Objective

The primary objective of this experiment is the study of the characteristics of speech using linear prediction analysis. This includes observing the LP spectrum and LP residual for voiced and unvoiced segments and studying the effect of order of LP analysis (normalized error), autocorrelation of signal and LP residual for voiced and unvoiced segments, and study of glottal pulse shapes.

Tutorial

Source-system modeling of speech signals using LP analysis

The vocal tract system can be modeled as a time-varying all-pole filter using segmental analysis. The segmental analysis corresponds to the processing of speech as short (10-30 ms) overlapped (5-15 ms) windows. The vocal tract system is assumed to be stationary within the window and is modeled as an all-pole filter of order \( p \) using linear prediction (LP) analysis. The LP analysis works on the principle that a sample value in a correlated, stationary sequence can be predicted as a linear weighted sum of the past few (\( p \)) samples. If \( s(n) \) denotes a sequence of speech samples, then the predicted value at the time instant \( n \) is given by, $$ \hat{s}(n) = \sum_{k=1}^{p}{a_k~s(n-k)} $$ where \( \{a_k\},~k=1,2,...,p \) is the set of linear predictor coefficients (LPC) and $p$ is the order of the LP filter. The error at time $n$ and the sum of squared errors \( E \) are given by, $$ r(n)~=~s(n)~-~\hat{s}(n) $$ $$ E=~\sum_{n}{r^2(n)} $$ The cost function \( E \) is minimized with respect to \( \{a_i\},~i=1,2,...,p \) over the interval \( {-\infty}~{\leq}~n~{\leq}~{\infty} \) (autocorrelation formulation) as, $$ {\partial{E}}/{\partial{a_i}}~=~0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1~{\leq}~i~{\leq}~p $$ This minimization leads to a set of normal equations, $$ \sum_{k=1}^{p}{a_k~R(i-k)} = -R(i)~~~~~~~~~~~~~~1~{\leq}~i~{\leq}~p $$ where $$ R(i) = \sum_{n=-\infty}^{\infty}s(n)~s(n+i)~~~~~~~~~~-{\infty}~{\leq}~i~{\leq}~{\infty} $$ is the autocorrelation signal. The solution of these normal equations gives the values of the predictor coefficients \( \{a_k\},~k=1,2,...,p \). The error signal \( r(n) \) obtained by inverse filtering the speech signal is referred to as the LP residual. The smooth variations (highly correlated) in the speech signal are captured by the LPCs and are attributed to the vocal tract characteristics. The complex poles of the LP filter occur as conjugate pairs, and each pair represents a resonator cavity, with a maximum response at a frequency (called as resonant frequency) where the poles are located on the z-plane. The vocal tract can be considered as a cascade of resonator cavities with different shapes and sizes. The resonant frequencies of these cavities are referred to as formants. The LP residual signal has large error values at regular intervals and can be attributed to the periodic impulses of excitation. Hence the LP residual is a good approximation to the excitation source signal and can be used further to extract the excitation source characteristics. A segment of voiced speech (windowed), frequency response of the inverse filter and the corresponding LP residual are shown in Fig.1.

Figure 1: Inverse filtering the speech signal for estimating the excitation source (LP residual) signal.

Collection of voiced (/a/)and unvoiced (/s/) speech segments

  • Record a vowel /a/ for one second at 10 KHz sampling frequency with 16 bit quantization. From this recorded speech file collect 200 ms in steady portion of the waveform.
  • Record an unvoiced segment /s/ for one second at 10 KHz sampling sampling frequency with 16 bit quantization. From this recorded speech file collect 200 ms in steady portion of the waveform.
  • The short voiced and unvoiced speech segments are shown in FigureÃÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Â ÃƒÆ’¢â‚¬â„¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Â ÃƒƒÃ‚¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ‚¢Ã¢â€šÂ¬Ã‚ ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â‚¬Å¡Ã‚¬ÃƒÃ‚¢Ã¢â‚¬Å¾Ã‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Â ÃƒÆ’¢â‚¬â„¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â‚¬Å¡Ã‚¬ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚ ÃƒƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â€šÂ¬Ã…¡ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¬ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â€šÂ¬Ã…¾ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Â ÃƒÆ’¢â‚¬â„¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Â ÃƒƒÃ‚¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â€šÂ¬Ã…¡ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¬ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Å¡ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚ ÃƒƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Â ÃƒÆ’¢â‚¬â„¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Å¡ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â‚¬Å¡Ã‚¬ÃƒÃ¢â‚¬Â¦Ãƒâ€šÃ‚¡ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Å¡ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¬ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â‚¬Å¡Ã‚¬ÃƒÃ¢â‚¬Â¦Ãƒâ€šÃ‚¾ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Å¡ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Â ÃƒÆ’¢â‚¬â„¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Â ÃƒƒÃ‚¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ‚¢Ã¢â€šÂ¬Ã‚ ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â‚¬Å¡Ã‚¬ÃƒÃ‚¢Ã¢â‚¬Å¾Ã‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Â ÃƒÆ’¢â‚¬â„¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’‚¢ÃƒÃ‚¢Ã¢â‚¬Å¡Ã‚¬ÃƒÃ¢â‚¬Â¦Ãƒâ€šÃ‚¡ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ‚¢Ã¢â€šÂ¬Ã…¡ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’¢â‚¬Å¡ÃƒÃ¢â‚¬Å¡Ãƒâ€šÃ‚¢ÃƒÃâ€Â ÃƒÆ’¢â‚¬â„¢ÃƒÆ’â€Â ÃƒƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢ÃƒÆ’ƒÃ¢â‚¬Â ÃƒÆ’¢ââÃÃÃ