Downloadable ContentDownload PDF
Comparative effectiveness of language modeling algorithms on acoustic level error samples
Various language models are used by speech recognition programs to improve the accuracy of converting sound to text. Because acoustical interpretation has significant flaws that are unlikely to be resolved anytime soon, the language model as a secondary step of analysis remains very influential in determining the overall accuracy of the speech recognition process. The earliest speech recognition programs used tri-gram language models, later refined to n-grams and hidden Markov models, to improve accuracy. The latest advances in speech recognition have been made by deemphasizing the n-gram model to utilize deep learning and deep neural network models. The goal of this research is to show whether placing more weight on language modeling, as demonstrated with simple tri-gram language modeling applied after the neural network model, can significantly improve the accuracy of voice dictation. To examine this, experiments were performed to classify voice dictation errors into several categorical types, and then apply language modeling trained with different language sources, from very general to very specific. Error sentences were created using Dragon NaturallySpeaking 12.5 by logging the errors which occurred during dictation of sample English language corpii of different types. These language files were then analyzed by the older bi-gram and tri-gram language models to determine which ones produced the greatest statistical difference between incorrect and correct sentences. An analysis of the mistakes made by the output of Dragon NaturallySpeaking 12.5 shows that tri-gram modeling favors correct sentences over the error sentences. Without access to the alternative choices rejected by Dragon NaturallySpeaking, no conclusion can be drawn to the degree that tri-gram modeling might introduce new error, but test results show that utilizing a simple tri-gram language model in addition to neural network and language model analysis already being performed would significantly reduce the number of false positives.