The Waveform Model of Vowel Perception and Production
by Michael A. Stokes
This book presents a new model of vowel perception and production derived from visual cues identified in waveform displays. In addition to describing waveform displays of vowels beyond previous descriptions, included in the book are descriptions of experimental evidence supporting near 100% vowel identification accuracy across 20 male talkers using the concepts in the model. The book content will be of interest to several academic fields including Cognitive Science, Psychology, Linguistics, Speech and Hearing, Language Acquisition, Neurolinguistics, Phonetics, and areas within Physics and Mathematics. Beyond these academic fields, the new model of vowel perception presented here could possibly be used to improve accuracy and speed within existing speech recognition systems, or it could be used to generate a new speech recognition program. Many speech recognition programs are based on simple statistical programs like Hidden Markov Models that ignore any theoretical basis to speech recognition. The Waveform Model differs from the HMM approaches since it has a theoretical basis rooted in articulation and that has potentially more promise than these simple HMM models that just take overall similarities in waveforms and try to match them to phonemes and words. Furthermore, many of the speech recognition programs use extensive training by a single user (in quiet conditions) in order to attain over 90% accuracy, which is still a relatively poor performance. The Waveform Model requires no training, can be used across talkers, and has accuracy above reported speech recognition performance (specific to vowels). In summary, the Waveform Model is innovative, and new to the literature and research communities.
About The Author
The author has extensive experience analyzing over 15,000 speech waveforms, with this work exhibited in a number of past presentations and published work. Some of the work represented the initial research showing that vowels can be identified from visual inspection of the waveforms of vowels (Stokes, 1996, 2001). The identification of a talker was also achieved from waveforms (i.e., voiceprints, similar to the use of fingerprints, Stokes, 2002). Since 1998, the author has been employed as a computer programmer working on a number of international business applications.