The Accuracy of Speech Recognition Dictation Software for Minority Language Users

By Elizabeth Meddeb.

Published by The Technology Collection

Format Price
Article: Print $US10.00
Article: Electronic $US5.00

To determine the role that discourse plays in shaping the output of speech recognition dictation software, I asked four participants, three of whom were minority language speakers (one non-native speaker of English and two speakers of African American Vernacular English) to fully enroll in an automatic speech recognition dictation program and subsequently compose a series of texts over time, using IBM’s ViaVoice 98 dictation software. I designed the task protocol to require these participants to compose two genres of writing (a summary and a response) based on the same stimulus text .To gauge software learning, I used recognition accuracy as the evaluation criterion, and the word error rate (WER) as the measure. I then calculated the software’s reduction of (transcription) error from the first speech session to the last. These calculations suggested the robustness of the software’s speech engine, as well as some indication as to how the participants’ discourse (accent, speech style and genre of speaking/writing) interacted with the software’s transcription.
With respect to the role that the participants’ discourse played in shaping the transcription accuracy of the software, results indicated that the software reacted differently to the various ethno-linguistic speech backgrounds represented in this study. In terms of WER measures, the software recognized the speaker of “General American” the best. This participant proved himself a “sheep” in that he maintained a consistent pronunciation style throughout the study. The non-native speaker of English held the second lowest average WER, followed very closely by one of the AAVE speakers. In terms of text type, the responses seemed to render more accurate transcriptions than summaries. The participants’ self-reported that the responses “flowed easier” and were more “spoken like.” The interaction of text type and recognition accuracy is not a well-researched phenomenon and may be a fruitful area for further inquiry.

Keywords: Speech Recognition, Discouse Analysis, Genre Studies

The International Journal of Technology, Knowledge and Society, Volume 4, Issue 6, pp.125-138. Article: Print (Spiral Bound). Article: Electronic (PDF File; 999.760KB).

Dr. Elizabeth Meddeb

Assistant Professor, Foreign Languages/ESL/Humanities, York College of the City University of New York, Jamaica, NY, USA

Elizabeth Meddeb is Assistant Professor and Coordinator of English as a Second Language at York College of the City University of New York. She teaches advanced composition courses to non-native speakers of English, as well as introductory courses in linguistics and humanities. Her research interests include the interaction between technology and language use. She is currently working on a research grant that investigates how speech recognition dictation technology shapes both spoken and written language use for non-native speakers of English. This study is an outgrowth of her dissertation research at Columbia University and her work experience at IBM’s TJ Watson Research Center.

Reviews:

There are currently no reviews of this product.

Write a Review