Short-Time Spectrum Analysis of Speech

  • The size of the window (rectangular in this case) determines the temporal resolution, i.e., our ability to localize (or analyze) the frequency components of as small a time window as possible.

  • Increase in the size of window increases the spectral resolution due to narrower main lobe, but reduces the temporal resolution due to lack of localization of frequency components over the windowed block of the signal, and vice versa. The smaller the window size the more is the averaging or smearing in the spectral domain due to incresed main lobe width.

  • Rectangular window provides the best spectral resolution, as it smears or averages the spectrum least due to a narrow main lobe. But distortion of spectrum is high due to side-lobe effect (strong side-lobe). Sidelobe attenuation is more with Hamming and Hanning windows, but the main lobe width increases.

  • Voiced speech log spectrum shows clearly the harmonic (source feature) and the formant structure (system feature), whereas there is no defined structure in the log spectrum of unvoiced speech segment.