Expt-3: Nature of Speech Signal
Objective

To study the time-varying nature of the speech signal in the time domain as well as frequency domain. To observe the time-varying nature of the excitation source and the vocal tract system for different sounds uttered by the same speaker. To study the variability of excitation and vocal tract system characteristics across speakers.

Tutorial
  1. Recording Speech Signal
    • Record/load speech signals for heed, head, hod using the provided utility.
  2. To obtain Average Pitch
    • Display the recorded speech waveform using the provided utility.
      $/i/$
      Figure 1: Speech waveform for the utterance "heed-head-hod"


    • Zoom the vowel segment. A segment of the vowel /i/ is shown in the Figure 2.

      $/i/$
      Figure 2: A part of the vowel /a/ in the speech segment of 'hod'

    • As shown in the Figure 3, pitch is measured as the interval between any two peaks.
    • Note the time in msec of the period for three or four cycles (say) and measure the average pitch period, preferably in the steady portion as illustrated in the Figure 2.
    • As pitch period is not constant over the entire vowel segment, make four or five pitch period measurements.
    • Mean of these values gives average pitch.
    • Compute autocorrelation sequence for coressponding segments. Measure the distance between the peak at zero and along the peak in either the positive or negative delays.

  3. To obtain Average Formant Frequencies
    • Display voiced segment of the recorded speech segment /hod/ using the provided utility.
    • Select the spectrum section in the menu popped up by right click of the mouse. Choose LPC section and set order to 10 and preemphasis to 1.0.This will display LPC spectrum as shown in the Figure 3.
    • $/i/$
      Figure 3: LPC spectrum of a short segment of vowel /a/ in the utterance of 'hod'. The spectrum is obtained using 10th order LP analysis. As shown in the figure, the peaks of the LP spectrum represent the formant frequencies.

    • The mean of each formants is calculated to obtain corresponding average values.
    • Repeat the previous step for two or three short segments of the voiced region.
    • Repeat the above step for recorded speech of /heed/ and /head/.
    • Repeat the above procedure for recorded speech of different speakers.
  4. Observations
    • The pitch for the same speaker varies with respect to time as well as varies across speakers. This clearly illustrates the variability of speech signal.
    • For given sound-unit, the formant frequency values varies slightly for the same speaker and also across speakers. This again illustrates the variability of the speech in frequency domain for the same speaker.
Procedure
  1. Record speech for three vowels in the context of heed (for vowel /i/, head (for vowel /e/) and hod (for vowel /a/).
  2. Note the average fundamental frequency (F0) using time domain representation.
  3. Note average formant frequencies (F1, F2 and F3) from 10th order LPC spectrum.
  4. Repeat for one more set of vowels.
  5. Compare the readings from the two sets (variability within a given speaker).
  6. Compare the average fundamental frequency (F0) and formant frequency values of one speaker with values from other speakers (variability among speakers).
  7. Write a brief note on the observations.
Experiment

Observations
  • The pitch for the same speaker varies with respect to time as well as varies across speakers. This clearly illustrates the variability of speech signal.

  • For a given sound-unit, the formant frequency values varies slightly for the same speaker. This illustrates the variability of the vocal tract system characteristics within the same speaker.

  • For a given sound-unit, the formant frequency values varies slightly across different speakers. This illustrates the differences in the vocal tract system characteristics across speakers, inspite of their articulation and perception as the same sound. This observation is useful to discriminate speakers. But the formant frequencies of a given sound unit are more similar across different speakers, than across sound units within the same speaker. This observation is useful for speech recognition studies.
Assessment
  1. Record the vowels /e/ and /o/, and measure the pitch and formant frequencies. Compare the formant frequencies with that of the vowels /a/, /i/ and /u/. Plot the coordinates of all the five vowels in \( (F_1,F_2) \) space (i.e., \( F_1 \) vs \( F_2 \)).

  2. Repeat assignment-1 for two more speakers and plot the \( (F_1,F_2) \) coordinates of on the same graph. Write down your observations.

  3. Record the vowel /a/ in the context of different unvoiced stop consonants (i.e., /ka/, /cha/, /Ta/, /ta/ and /pa/). Measure the pitch and formant frequencies over the first few cycles of the vowel beginning. Compare them with the measurements made from the center of the vowel. Write down your observations.

  4. Record diphthongs /ai/ and /au/. Measure the pitch and formant frequencies towards the beginning, central and ending regions of the recordings. Write down your observations.

References
  • L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Chapter 4, New Delhi, India : Pearson Education Inc., 1993.

  • L.R. Rabiner, Biing-Hwang Juang and B. Yegnanarayana, Fundamentals of Speech Recognition, Chapters 2 and 3, New Delhi, India (Indian Subcontinent Adaptation) : Pearson Education Inc., 2009.