The objective of this experiement is to get a feel for the difficulty in manually segmenting speech into linguistically meaningful sounds like phonemes, syllables, words, etc. The goal of speech signal-to-symbol (S-S) transformation by a machine is just to accomplish this task automatically.
The experiment consists of three parts. In the first part presegmented speech signal data is given, by displaying the speech waveform for an utterance in the chosen language. The segmentation boundaries of the chosen units (phoneme, syllable or word) in terms of number of samples are given in a table adjacent to the waveform.
The experiment consists of selecting a segment of speech using the buttons provided, and verify by listening to the selected segment the boundaries of the sound units given in the table. Note that the time axis is indicated in number of samples and the speech signal is sampled at 16000 samples/sec.
The chosen utterance is given in the script of that language, and the corresponding transliteration is given in Roman script. The sequence of the chosen subword unit is also given below the transliteration.
The displayed speech signal can be played for listening using the play button in the transport panel. The record button is used for recording any new utterance for segmentation. Ofcourse the segmentation boundaries for the recorded utterance will not be available in the table.
The arrow button to the left most position at the top of the waveform panel denotes the default cursor mode, which can be used to get back to normal/default mode from any of the other modes chosen by clicking on the buttons adjacent to it, such as zoom-in (+ button), zoom-our (- button), etc.
The + button helps to zoom-in waveform at the location chosen by the mouse pointer (click of the left mouse button).
The - button helps zoom-out the waveform (reverse of the + or zoom-in operation).
The next button is the zoom-to-fit button, which helps to select a portion of the signal and simultaneously zoom-in to display the selected portion.
The next button (second from right) helps to select a region of the speech signal wihout zooming into the selected region.
The last or rightmost button is to restore the waveform to the original display of the entire waveform.
By selecting regions of each subword unit shown in table, observe the waveform and listen to the sound of that unit.
Part-2 of the experiment involves selecting the boundaries of the subword units by listening to it and entering the values (sample number) into the table.
The system prompts with a message of how many samples your marking is deviating from the actual (reference) marking, if your marking is erroneous by more than 30 ms. Note the prompt message and adjust/reenter the value into the table till the system accepts.
Part-3 of the experiment involves transliteration of a sentence in any one of the chosen Indian languages using English alphabets, syllabification of the utterance, recording, listening and segmentation of the speech signal.