How to count the number of words spoken with any method (SR or otherwise)

I'm having trouble getting pointers to how to accomplish what seems like a deceptively easy task:

Given the audio stream, how do you calculate the number of words that have been spoken in real time?

I don't need to understand what words are, but just have an accurate count of the words that have been spoken. The counter does not need to be too precise and may even consider utterances and other "grunts" such as coughing.

It seems that all speech recognition systems depend on a predefined grammar that must be provided before they can parse the phonemes that are said to be converted to known words with some degree of accuracy. But I don't care about accuracy in general, but about the speed of the spoken words.

The important thing is that this is done in real time and allows the system to provide alerts after a certain number of words have been spoken. The system will stimulate a visual cue to pause and then the speaker can continue.

I looked through the CMU Sphinx FAQ and found that the idea of ​​"word recognition" is not yet supported. I don't really need real-time search for specific words, but it comes close to what I'm looking for. Looking for very little silence in the waveform seems to be a very crude way to do this and probably not very accurate, but that's all I have for now.

Any pointers to algorithms, research papers, or any other ideas would be appreciated!

+3
nlp speech-recognition counting speech


source to share


No one has answered this question yet

Check out similar questions:

five
Speech to phoneme in .Net
2
How to count the number of oral syllables in a sound file?
1
How do I find a word in an audio file in Python?
1
How to improve speech recognition on startup with PocketSphinx Android?
1
Forced alignment issue in speech recognition - HTK
1
Aligning Class Labels with a Data Point on a Waveform Plot
1
How to get timestamp when a word was spoken using Sphinx
0
Word / Phoneme Corpus for Elman SRN (English)
0
How to detect word boundaries / word count in audio processing? (without speech recognition)
0
How to add phoneme recognition using pocketsphinx on Android



All Articles
Loading...
X
Show
Funny
Dev
Pics