Vocal Trigger Identification

Speech recognition on handheld devices is usually triggered by the push of a button. How can I get speech recognition to run without this? My Raspberry PI device intentionally doesn't have anything that users can manually interact with - just a microphone hanging on the wall.

I am trying to implement a way to understand a simple trigger command that would initiate a sequence of actions. In short, I want to run one .sh script whenever it "hears" a sound trigger. I don't want it to understand nothing other than just a trigger - it makes no sense that it has to be decoded from the trigger itself - like the name of the script or parameters. A very simple function is "hear trigger → execute .sh script"

I explored various options:

  • Getting an audio stream continuously sent to Google Speech Recognition - Not a good idea - wasting too much bandwidth and resources

  • Getting an off-line speech recognition app to continuously listen to an audio stream and "select" trigger words is a little better, but pretty much a waste of resources and these systems need to be taught sound samples - this pretty much removes the ability to quickly set custom names to devices ...

  • Use some sort of tone processing to make it react to a sequence of loud noises - hands clapping twice or something like that - not too bad, but I think my hands will fall after I get the item, I'll be killed by my a family member, since I usually experiment with my toys at night when they are in bed.

  • Whistle recognition - not much different from the previous option, but your palms do not hurt, and I will most likely survive the test if I learn to whistle softly. I was able to find an IBM article on how to command a computer using whistle commands - this approach is a lot like local speech recognition applications, but you teach it to understand different whistle sequences. However, from this I did not understand how I could teach him to understand any whistle, regardless of its tone.

I kind of have a whistle idea - it seems like this should be the least resource hungry among the other options - how can I do this?

Are there other vocal triggers that can be easily implemented given that I am limited to the Raspberry PI hardware?

+3


source to share


1 answer


Mono is an environment that you can install on pi so that you can compile and run C # applications, and I believe it supports System.Speech and System.Speech.Recognition. You can use them to easily write an application and simply indicate which words you want to listen to. Burn it to your computer and just move the exe to pi and run it with the mic dangling from the pi. I made a similar application, but I used a socket server and sent commands this way. The way you set the commands is pretty straightforward.



    SpeechRecognitionEngine rec = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US")); 
    rec.SetInputToDefaultAudioDevice();
    rec.SpeechRecognized += speech_recognized;
    var c = new Choices();
    c.Add("Trigger");
    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.RecognizeAsync(RecognizeMode.Multiple);
    rec.MaxAlternates = 0;


    private void speech_recognized(object speechsender, SpeechRecognizedEventArgs e)
    {
    if(e.Result.Text == "Trigger"){
    //run your script
    }

}

      

0


source







All Articles