Expt-2 : Speech production mechanism
Objective

The objective of this experiment is to express the characteristics of speech in terms of the production characteristics. This is done by identifying/locating voiced/unvoiced/plosive/silence regions, and providing acoustic phonetic description of the regions.

Tutorial

1 Identifying the Voiced/Unvoiced/Plosive/Silence regions

From time domain waveform:

  1. Voiced: quasiperiodicity and high amplitude regions
  2. Unvoiced: nonperiodic and random like noise
  3. Plosive: noise burst like signal indicates the sudden release of constriction at different places in vocal tract system
  4. Silence: no speech signal (zero amplitude)

2 Acoustic phonetic description

This description is based on the theory of acoustic phonetics. Acoustic phonetics deals with study of the physical properties of the speech sounds, as transmitted between mouth and ear. This description is provided using place of articulation (POA) and manner of articulation (MOA). The following Tables 1 and 2 used to describe speech sounds (vowels and consonants (V and C)) using manner of articulation and place of articulation.

Example: kitAb mEj par hai

kitAb ( /k/, /i/, /t/, /A/, /b/)

Unvoiced unaspirated velar stop followed by front vowel followed by unvoiced unaspirated dental stop followed by middle vowel followed by voiced unaspirated bilabial stop.

mEj ( /m/, /E/, /j/)

Nasal followed by front vowel followed by voiced unaspirated stop.

par ( /p/, /a/, /r/ )

Unvoiced unaspirated bilabial stop followed by middle vowel followed by semivowel.

hai ( /h/, /ai/ )

Fricative followed by diphthong.

Table 1: Vowel classification
Vowel Type
Sound Units
Short
a,i,u,e,o
Long
A,I,U,E,O
Diphthongs
ai,au


Table 2: Consonant classification
Place of Articulation
Manner of Articulation
Nasals
Semi-vowels
Fricatives
Unvoiced
Voiced
Unaspirated
Aspirated
Unaspirated
Aspirated
Velar
k
kh
g
gh
kn

h
Palatal
ch
chh
j
jh
chn
y
sh
Alveolar
T
Th
D
Dh
Tn
r
shh
Dental
t
th
d
dh
n
l
s
Bilabial
p
ph
b
bh
m
v



3 Description of time varying excitation

The time varying excitation is described for the sound units present in the utterance using the Table 3.

Example: kitAb mEj par hai

/k/ : Release of velar constriction
/i/ : Vocal folds vibration
/t/ : Release of dental constriction
/A/ : Vocal folds vibration
/b/ : Vocal folds vibration + release of bilabial constriction
/m/ : Vocal folds vibration + lowered velum + closure of lips

/E/ : Vocal folds vibration
/j/ : Vocal folds vibration + release of alveolar constriction
/p/ : Release of bilabial constriction
/a/ : Vocal folds vibration
/r/ : Vocal folds vibration + turbulence at alveolar ridge
/h/ : Narrow constriction at velum
/ai/: Vocal folds vibration

5 Time varying system description

Description of time-varying system characteristics is nothing but specifying the positions of different articulators and shape of the vocal tract while pro
ducing the particular sound unit.  For production of vowels the time-varying system characteristics are described by the extent of opening of oral cavity, position of a tongue hump in a oral cavity and height of the tongue hump. The position and height of the tongue hump for different vowels can be described using the Table 3. The time varying system is described for the sound units present in the utterance using the Table 5.

Example: kitAb mEj par hai

Time varying characteristics from the signal
/k/
:
Complete closure at velum position.
/i/ :
Tongue hump is high and is in front position of the vocal tract (VT) system, VT system is narrowly open.
/t/ :
Complete closure at dental position.
/A/, /a/ :
Tongue hump is low and is in back position of the vocal tract (VT) system, VT system is widely open.
/b/ :
Closure at lips.
/m/ :
Opening of velum and closure at lips.
/E/ :
Tongue hump is medium and is in front position of the VT system, VT system is moderately open.
/j/ :
Narrow opening at alveolar ridge
/p/ :
Closure at lips.
/r/ :
Narrow opening at alveolar ridge
/h/ :
Narrow opening at velum
/ai/ :
Tongue hump at alveolar ridge, narrow opening at alveolar ridge, VT system is narrowly open.


Table 3: Position and height of the tongue hump for producing different vowels

Front Central Back
High
/i/

/u/
Mid
/e/

/o/
Low

/a/


Table 4: Excitations and the corresponding sounds
Excitation type Sound units
Vocal folds vibration
Vowels
Release of velar constriction k,kh
Release of palatal constriction ch,chh
Release of alveolar constriction T,Th
Release of dental constriction
t,th
Release of bilabial constriction
p,ph
Release of velar constriction and vocal folds vibration g,gh
Release of palatal constriction and vocal folds vibration j,jh
Release of alveolar constriction and vocal folds vibration D,Dh
Release of dental constriction and vocal folds vibration d,dh
Release of bilabial constriction and vocal folds vibration b,bh
Vocal folds vibration, velum is lowered and constriction at velum kn
Vocal folds vibration, velum is lowered and constriction at palatal chn
Vocal folds vibration, velum is lowered and constriction at alveolar Tn
Vocal folds vibration, velum is lowered and constriction at dental n
Vocal folds vibration, velum is lowered and constriction at lips m
Vocal folds vibration and narrow constriction at palatal y
Vocal folds vibration and narrow constriction at alveolar ridge r
Vocal folds vibration and narrow constriction at dental l
Vocal folds vibration and narrow constriction at lips v
Narrow constriction at velum (turbulent) h
Narrow constriction at palatal (turbulent) sh
Narrow constriction at alveolar (turbulent) shh
Narrow constriction at dental (turbulent) s

Table 5: System characteristics and the corresponding sounds
Vocal tract system characteristics Sound units
Tongue hump is low and it is in central position of the vocal tract (VT) system, VT system is widely open a
Tongue hump is high and it is in front position of the VT system, VT system is narrowly open i
Tongue hump is medium and it is in front position of the VT system, VT system is moderately open e
Tongue hump is high and it is in back position of the VT system, VT system is narrowly open and cylindrical in shape u
Tongue hump is medium and it is in back position of the VT system, VT system is moderately open and cylindrical in shape o
Complete closure at velum k,kh,g,gh
Complete closure at palatal ch,chh,j,jh
Complete closure at alveolar
T,Th,D,Dh
Complete closure at dental t,th,d,dh
Complete closure at lips p,ph,b,bh
Complete closure at velum and opening of nasal cavity kn
Complete closure at palatal and opening of nasal cavity chn
Complete closure at alveolar opening of nasal cavity Tn
Complete closure at dental opening of nasal cavity n
Complete closure at lips opening of nasal cavity m
Narrow constriction at velum h
Narrow constriction at palatal sh
Narrow constriction at alveolar shh
Narrow constriction at dental s
Partial closure of VT with tongue hump at palatal y
Partial closure of VT with tongue tip at alveolar ridge r
Partial closure of VT with tongue tip at dental l
Partial closure of VT with lower lip and upper teeth v


Procedure

Part-1: Acoustic-Phonetic Labeling (pre-labeled examples)

  1. Select one of the prelabeled examples from the drop down box. Click on the go button.
  2. Identify by repeatedly zooming in and listening to different segments of the speech waveform regions of voiced/unvoiced/plosive/silence. Observe the time-varying nature of the waveform from one region to the other.
  3. Select the correct excitation type, manner of articulation (MoA) and place of articulation (PoA) for each of the phonemes listed in the table.
  4. Click on the phoneme symbol (first column of the table) to zoom fit to the presegmented label, and observe the signal characteristics.
  5. Observe the time-varying excitation and system characteristics from spectrogram.
  6. Write a brief note on the observations.

Part-2: Acoustic-Phonetic labeling without feedback

  1. Write down a sentence of your choice in one of the chosen languages.
  2. Record the sentence you chose in the previous step.
  3. Observe the signal characteristics of differents phones in the sentence and choose the appropriate excitation type, MoA and PoA for each phoneme.
  4. Repeat the experiment by recording different sentences covering different categories of sound units.


Experiment

Observations
  • Every phoneme can be completely and uniquely described by using the acoustic-phonetic descriptors.

  • Time varying nature of the excitation source can be clearly observed from the speech waveform.
    1. Quasiperiodic waveform during voiced speech
    2. Random noise-like waveform during unvoiced speech
    3. Lack of any significant signal amplitude during silence or nonspeech region
    4. Low or negligible signal amplitude (closure region) followed by a burst during the production of plosive sounds.

  • Time varying nature of the vocal tract system can be observed from the change in waveform shape within each glottal cycle (pitch cycle) for different voiced sounds.

Assessment
  1. Voice onset time (VOT) in the context of CV units, where C denotes a stop consonants and V denotes a vowel, is defined as the interval of time between the onset of burst (release of closure) and the onset of voicing (vibration of the vocal folds). In the case of unvoiced stop consonants(/k/, /ch/, /T/, /t/, /p/), the onset of burst if followed by the onset of voicing and VOT is measured as a positive value. In the case of voiced stop consonants (/g/, /jh/, /D/, /d/, /b/), the onset of voicing preceed the onset of burst, and hence measured as a negative value.

    1. Record the CV units /ka/, /cha/, /Ta/, /ta/ and /pa/ (unvoiced unaspirated stop consonants followed by a fixed vowel /a/). Measure the VOT for each of the CV units. Write down your observations.
    2. Record the CV units /kha/, /chha/, /Tha/, /tha/ and /pha/ (unvoiced aspirated stop consonants followed by a fixed vowel /a/). Measure the VOT for each of the CV units. How do they compare with their unaspirated counterparts? Write down your observations.
    3. Record the CV units /ga/, /jha/, /Da/, /da/ and /ba/ (voiced unaspirated stop consonants followed by a fixed vowel /a/). Measure the VOT for each of the CV units. How do they compare with their unvoiced counterparts? Write down your observations.
    4. Repeat the exercise for voiced aspirated stops and compare with their unaspirated and unvoiced counterparts. Write down your observations.

  2. Record the vowels /a/, /i/ and /u/. Zoom in to the waveforms and observe the quasi-periodic nature of the signal. Can you identify or discriminate between the three vowels based on the waveform?

References
  • L.R. Rabiner, Biing-Hwang Juang and B. Yegnanarayana, Fundamentals of Speech Recognition, Chapter 2, New Delhi, India (Indian Subcontinent Adaptation) : Pearson Education Inc., 2009.

  • Thomas F. Quatieri, Discrete-Time Speech Signal Processing, Chapter 3, New Delhi, India : Pearson Education Inc., 2002.

  • J.H.Deller (Jr.), J.G.Proakis, and J.H.L.Hansen, Discrete-Time Processing of Speech Signals, Chapter 2: Fundamentals of Speech Science, pp.99-150, New York, USA : McMillan Publishing Company, 1993.