Wednesday, August 30, 2006

Speech recognition on a chip


RegHardware reports: [edited]

Speech technology ranks right down there with flying cars, robots and Windows as the grandest of disappointments in geekdom. Thankfully, the horrid state of the technology hasn't broken the will of all researchers in the speech field. In fact, one team at Carnegie Mellon University optimistically thinks they may have solved the speech recognition conundrum with a new chip.

Armed with a $1m grant from the National Science Foundation, CMU's In Silico Vox team has set the modest goal of showing a 100 to 1,000 times improvement in the performance of speech recognition systems. Such a leap would improve the quality of speech technology to the point where it would feasible to place sophisticated speech engines in devices such as cell phones or PDAs. Rob Rutenbar, a professor at CMU, unveiled the processor that is key to the project's end goal today at the Hot Chips conference here.

"It's just a bad idea trying to push this technology in software only," Rutenbar said. "Most of the applications of tomorrow don't want 20 to 30 per cent better performance. They want factors of 100 or factors of 1,000."

Rutenbar likened the move to create a speech chip with the established practice of creating specialized processors to deal with graphics operations.

"Nobody paints pixels in software," he said. "You would have to be nuts. Videos from ESPN are not painted on your cell phone screen by software. There's a small graphics engine doing that."

Some companies have produced decent speech recognition software for large call centers and automated phone systems. These packages, however, require far more processing power than you're likely to find on smaller computing devices.

The speech systems must compare 50 main sounds used in typical conversation against thousands of permutations on these sounds made when people pronounce words in different ways. The speech engines then run through database of common two- and three-word combinations against a backdrop of some 50,000 different words to come up with strong matches for what a person is actually saying. All told, this process chews through processor, memory and energy resources. That's bad news for a cell phone designer.

The CMU team, however, has already created a lightweight hardware speech engine based on an FPGA (Field Programmable Gate Array) from Xilinx that solves many of these problems. Rutenbar showed the chip in action with it successfully converting the question, "When will Windows arrive?" into text on the screen.

Right now, the processor can only handle about 1,000 words at a modest speed. By the end of the year, CMU hopes to create a larger FPGA system capable of dealing with 5,000 words in real-time. Then, next year it will march to 10,000 and 50,000 words on the FPGA system, while exploring full-fledged silicon designs.
------------

5 comments:

Anonymous said...

Ah, finally... USEFUL technology! I've been waiting on decent speech recognition for quite some time. I'm longing for the day I can control my bedroom - nay, my house - just by barking orders at it.

Major Look said...

My guess is that regional accents may preclude certain users from benefiting from Speech Recognition.

I know that here in Wales, I have enough trouble understanding people, let alone get a computer chip to work it out!

Diolch yn fawr, like.

Brett Jordan said...

Yes, speech recognition has major problems with accents, in fact the vocal pattern changes caused by a sore throat, or even tiredness can confuse current technology.

And, of course, for speech recognition to be truly universal, at some point someone is going to have to encode the 6800 living languages onto a chip!

Not sure what the gibble was you finished your comment with (did your fingers slip on the keyboard?), but thank you anyway :-)

Anonymous said...

I have a regional accent ("norn irn") and no speech recognition software has ever understood a word I've said to it. But then, nor does my husband. Do you think I could get the chip installed in him?

Brett Jordan said...

Hi Sandra, thanks for your comment... have you considered that it might be YOUR software that needs upgrading? :-)

 
UA-60915116-2