Android: speech recognition, what technologies are available

I am new to "voice recognition" in android.

I have a requirement in my application to have "speech recognition". So I do my homework. I found this 1. The Android SDK has support for this and it used "google voice recognition", So from what I understand the weather conditions, we call the recognizer with intent or use the SpeechRecogniser class, the actual recognition is done on the google cloud server ... I've tried sample apps using both methods and the match rate is very slow in both cases \ (First, is this my finding? I didn't get a correct match for most of the words / sentences I've tried).

  • Will there be a difference in conclusions for these two methods, i.e. triggering by intent / or using the SpeechRecogniser class)

  • All apps depend on this Google technology, where the voice is sent as audio bytes and recognized on the cloud server. I saw that Shazam uses a different technology, but they have their own database. Are there any other technologies used

  • I've seen a lot of "siri for android". Any notes on how these apps actually work?

Thanks a lot for your time and help.

+1


source to share


2 answers


1) you will get the same results when using RecognizerIntent

or SpeechRecognizer

. The main difference is in the user experience. RecognizerIntent

forces the user to go through standard speech recognition. With the help, SpeechRecognizer

you can control how the application collects speech and when it processes it. The advantage RecognizerIntent

is that it is easy to program and familiar to users. With the help, SpeechRecognizer

you can implement advanced features such as listening to speech in the background. You also get a bug report.

Also, some words are easy to recognize as "apple", but some are as heavy as "cumin" for various reasons. You have to be smart to match what google returns in order to implement something reliable.

2) I'm not sure what you mean by their own database. Your application will have a "database" that you are trying to map to what the user is saying.



3) Probably a combination of natural language processing, user modeling, techniques to emulate human dialogue. Or is it just a big set of hand-coded rules to make them look smart. I think it's a lot of work to try and make something believable.

Check out some of my code examples here: https://github.com/gmilette/Say-the-Magic-Word-

+2


source


Yes ... you're on the right track. Here is a nice artistic feature of speech recognition . and I think you will also find some information about this link and it is interesting for you!



+1


source







All Articles