BINGHAMTON, NY Anyone who has used an automated airline reservation system has experienced the promise and the frustration inherent in today's automatic speech recognition technology. When it works, the computer "understands" that you want to book a flight to Austin rather than Boston, for example. Research conducted by Binghamton University's Stephen Zahorian aims to improve the accuracy of such programs.
Zahorian, a professor of electrical and computer engineering, recently received a grant of nearly half a million dollars from the Air Force Office of Scientific Research. The funds will support the two-year development of a multi-language, multi-speaker audio database that will be available for spoken-language processing research. Zahorian and his team plan to gather and annotate recordings of several hundred speakers each in English, Spanish and Mandarin Chinese.
"The challenge," he said, "is to get speech recognition working better in real-life situations."
That's why the samples in the new database will come from publicly available sources such as YouTube.
Zahorian's team will annotate each sample, creating a more detailed version of closed captioning, including time stamps and descriptions of background sounds. Once the human listener has finished with the transcription, automatic speech recognition algorithms will be used to align the recording with the captions. Next, software will be developed to verify and correct errors in the time alignment.
"Speech-recognition algorithms begin by mimicking what your ear does," Zahorian said. "But we want the algorithms to extract just the most useful characteristics of the speech, not all of the possible data. That's because more detail can actually hurt performance, past a certain point."
The field of automatic speech recognition has a long history, dating back to projects at Bell Labs before the computer age. These days, much of the technology relies on a
|Contact: Gail Glover|