Speech Production Mechanism

1. Identifying the Voiced/Unvoiced/Plosive/Silence regions

From time domain waveform:

Voiced: quasiperiodicity and high amplitude regions Unvoiced: nonperiodic and random like noise Plosive: noise burst like signal indicates the sudden release of constriction at different places in vocal tract system Silence: no speech signal (zero or relatively small amplitude)

2. Acoustic phonetic description

Acoustic phonetics (AP) deals with the study of speech sounds by relating the acoustic signal characteristics to the speech production mechanism.

Speech sounds may be categorized broadly as vowels and consonants. Vowels are sounds produced with least constriction in the oral cavity, whereas consonants are sounds produced with different degrees of constriction in the oral cavity. Diphthongs are another category of sounds which are combinations of two vowels, continuously changing from one vowel to another. Consonants can be either stops, nasals, fricatives or approximants (semivowels) with increasing degrees of closure in that order. Stop sounds are consonants with a complete closure in the oral cavity which is released as a sudden burst or plosion. Stop sounds can be categorized as either unvoiced (/k/, /ch/, /T/, /t/, /p/) or voiced (/g/, /jh/, /D/, /d/, /b/) depending on the absence/presence of voicing (vibration of the vocal folds) during the closure. Stops are also categorized as velar (/k/, /kh/, /g/, /gh/), palatal (/ch/, /chh/, /jh/, /jhh/), alveolar (/T/, /Th/, /D/, /Dh/), dental (/t/, /th/, /d/, /dh/) or balabial (/p/, /ph/, /b/, /bh/) based on the place of closure, also known as place of articulation (PoA). Based on the manner of articulation (MoA), i.e., the absence or presence of aspiration (a strong turbulent burst of air) at the time of release of closure, stops may be categorized as unaspirated or aspirated (e.g., /k/ as against /kh/). Nasals are also stop sounds with a complete closure in the oral cavity, except for the lowering of velum which connects the otherwise isolated nasal cavity to the oral cavity. Nasals can also be classified as velar, palatal, alveolar, dental or bilabial based on PoA (/gn/, /jn/, /N/, /n/, /m/, respectively). Fricatives are sounds produced by forcing a turbulent noise-like air stream through a narrow constriction at some point in the oral cavity. They can be either velar, palatal, alveolar, dental or bilabial (/h/, /shh/, /sh/, /s/, /f/). Approximants or semivowels are sounds with a slighlty lesser constriction compared to fricatives, but more constrained compared to vowels. Approximants can be categorised as palatal, alveolar, dental or bilabial (/y/, /r/, /l/, /v/, respectively) based on PoA.

In the case of vowels, the AP description may be given in terms of duration (short and long) as in Table 1. Dipthongs are sounds which make a continuous transition from one vowel position to other, and are equivalent to long vowels in terms of duration. AP description of wowels may also be given in terms of the tongue hump position (front, central, back) within the oral cavity, and in terms of height of the tongue hump (high, mid, low), as given in Table 2. In the case of diphthongs, the tongue position changes from that of one vowel to another. The AP description of consonants is given in terms of place of articulation (POA) and manner of articulation (MOA), as shown in Table 3.

Example: kitAb mEj par hai

kitAb ( /k/, /i/, /t/, /A/, /b/)

Unvoiced unaspirated velar stop, followed by front-high short vowel, followed by unvoiced unaspirated dental stop, followed by central-low long vowel, followed by voiced unaspirated bilabial stop.

mEj ( /m/, /E/, /j/)

Bilabial nasal, followed by front -mid long vowel, followed by voiced unaspirated palatal stop.

par ( /p/, /a/, /r/ )

Unvoiced unaspirated bilabial stop, followed by central-low short vowel, followed by alveolar approximant.

hai ( /h/, /ai/ )

Fricative followed by diphthong (central-low to front-high).

Table 1: Vowel classification

Vowel Type Sound Units
Short a,i,u,e,o
Long A,I,U,E,O
Diphthongs ai,au

Table 2: Position and height of the tongue hump for producing different vowels


Table 3: Consonant classification


3. Description of time varying excitation

The main types of excitation are as follows:

  • Voicing - vibration of the vocal folds

  • Burst or plosion - realease of oral closure

  • Frication - random noise-like turbulence

Vowels and approximants typically have voiced excitation, except while whispering. Approximants may have a small amount of frication along with voicing. Stop sounds typically have a plosive excitation, either with or without the presence of voicing during closure. Nasals are mostly voiced, and with the burst or plosion due to release of closure being weak or almost imperceptible. Fricatives are sounds with a turbulent excitation, which may also be refered to as voiceless or unvoiced excitation. Voicing may be present along with turbulance as in the case of voiced fricatives (/v/, /zh/, /zhh/) present in English.

Example: kitAb mEj par hai

The time varying excitation is described for the sound units present in the utterance using the Table 4.

/k/ : Release of velar constriction

/i/ : Vocal folds vibration

/t/ : Release of dental constriction

/A/ : Vocal folds vibration

/b/ : Vocal folds vibration + release of bilabial constriction

/m/ : Vocal folds vibration + lowered velum + closure of lips

/E/ : Vocal folds vibration

/j/ : Vocal folds vibration + release of palatal constriction

/p/ : Release of bilabial constriction

/a/ : Vocal folds vibration

/r/ : Vocal folds vibration + turbulence at alveolar ridge

/h/ : Tubulence at velar constriction

/ai/: Vocal folds vibration

Table 4: Excitations and the corresponding sounds

Excitation type Sound units
Vocal folds vibration Vowels
Release of velar constriction k,kh
Release of palatal constriction ch,chh
Release of alveolar constriction T,Th
Release of dental constriction t,th
Release of bilabial constriction p,ph
Release of velar constriction and vocal folds vibration g,gh
Release of palatal constriction and vocal folds vibration j,jh
Release of alveolar constriction and vocal folds vibration D,Dh
Release of dental constriction and vocal folds vibration d,dh
Release of bilabial constriction and vocal folds vibration b,bh
Vocal folds vibration, velum is lowered and constriction at velum kn
Vocal folds vibration, velum is lowered and constriction at palatal chn
Vocal folds vibration, velum is lowered and constriction at alveolar Tn
Vocal folds vibration, velum is lowered and constriction at dental n
Vocal folds vibration, velum is lowered and constriction at lips m
Vocal folds vibration and narrow constriction at palatal y
Vocal folds vibration and narrow constriction at alveolar ridge r
Vocal folds vibration and narrow constriction at dental l
Vocal folds vibration and narrow constriction at lips v
Narrow constriction at velum (turbulent) h
Narrow constriction at palatal (turbulent) sh
Narrow constriction at alveolar (turbulent) shh
Narrow constriction at dental (turbulent) s

5 Time varying system description

Description of time-varying system characteristics is nothing but specifying the positions of different articulators and shape of the vocal tract while producing the particular sound unit. For production of vowels the time-varying system characteristics are described by the extent of opening of oral cavity, position of tongue hump in the oral cavity and height of the tongue hump. The position and height of the tongue hump for different vowels can be described using the Table 2. The time varying system characteristics for consonants can be described using Table 3.

Example: kitAb mEj par hai

The time varying system is described for the sound units present in the utterance using Table 5.

/k/ : Complete closure at velum position.

/i/ : Tongue hump is high and is in front position of the vocal tract (VT) system, VT system is narrowly open.

/t/ : Complete closure at dental position.

/A/ : Tongue hump is low and is in back position of the vocal tract (VT) system, VT system is widely open.

/b/ : Closure at lips.

/m/ : Opening of velum and closure at lips.

/E/ : Tongue hump is medium and is in front position of the VT system, VT system is moderately open.

/j/ : Narrow opening at alveolar ridge.

/p/ : Closure at lips.

/r/ : Narrow opening at alveolar ridge.

/h/ : Narrow opening at velum.

/ai/ : Tongue hump at alveolar ridge, narrow opening at alveolar ridge, VT system is narrowly open.

Table 5: System characteristics and the corresponding sounds

Vocal tract system characteristics Sound units
Tongue hump is low and it is in central position of the vocal tract (VT) system, VT system is widely open a
Tongue hump is high and it is in front position of the VT system, VT system is narrowly open i
Tongue hump is medium and it is in front position of the VT system, VT system is moderately open e
Tongue hump is high and it is in back position of the VT system, VT system is narrowly open and cylindrical in shape u
Tongue hump is medium and it is in back position of the VT system, VT system is moderately open and cylindrical in shape o
Complete closure at velum k,kh,g,gh
Complete closure at palatal ch,chh,j,jh
Complete closure at alveolar T,Th,D,Dh
Complete closure at dental t,th,d,dh
Complete closure at lips p,ph,b,bh
Complete closure at velum and opening of nasal cavity kn
Complete closure at palatal and opening of nasal cavity chn
Complete closure at alveolar opening of nasal cavity Tn
Complete closure at dental opening of nasal cavity n
Complete closure at lips opening of nasal cavity m
Narrow constriction at velum h
Narrow constriction at palatal sh
Narrow constriction at alveolar shh
Narrow constriction at dental s
Partial closure of VT with tongue hump at palatal y
Partial closure of VT with tongue tip at alveolar ridge r
Partial closure of VT with tongue tip at dental l
Partial closure of VT with lower lip and upper teeth v