Speech recognition for small vocabulary (about 20 words)

Question

Speech recognition for small vocabulary (about 20 words)

I am currently working on a project for my university. The challenge is to write a speech recognition system that will run in the background on the phone, waiting for multiple commands (for example, call 0 123 ...).

This is 2 months of the project, so it doesn't have to be very precise. The amount of acceptable noise may be small and words will be separated by moments of silence.

I am currently at the point of loading a sample word encoded in RAW 16 bit PCM format. Dividing it into chunks (about 50 per second) and running an FFT on each chunk to get the frequency spectrum.

Things to decide: 1) Go through the longer entry and break it down into words. 2) finding the best match for a word

1) I was thinking about just checking chunk after chunk, and if I came across multiple chunks that have higher human frequency heights, let's say this word started. Anyway, I am looking for resources that can help with this.

2) This seam is slightly stiffer. Should I use HMM for such a system, or maybe there are simpler methods that assume the vocabulary is so small (20 words)?

Edit: The crux of the project is to write the system yourself, so I can't use off-the-shelf libraries like Sphinx or HTK.

Regards, Karol

+3

c ++ fft hidden-markov-models speech-recognition speech-to-text

Karol Czaradzki May 20 '15 at 12:32

source to share

2 answers

To recognize commands on your phone, you can use Pocketsphinx. A tutorial that covers speech recognition applications on Android is available on the CMUSphinx website.

0

Nikolay Shmyrev May 20 '15 at 13:38

source to share

Karol Czaradzki · Accepted Answer · 2015-07-29T16:52:02+0000

If anyone has the same question in the future. Find 2 basic words:

MFCC - Chalk Frequency Cepstral Coefficients for calculating a series of coefficients for each word pattern

DTW - for matching captured word with patterns A good enough description of DTW can be found on wikipedia

This approach was good enough to have about 80% vocabulary accuracy for 20 words and give a good demonstration during class.

Speech recognition for small vocabulary (about 20 words)

More articles: