AVAudioSession issue when using SFSpeechRecognizer after AVSpeechUtterance

Question

AVAudioSession issue when using SFSpeechRecognizer after AVSpeechUtterance

I am trying to use SFSpeechRecognizer for speech-to-text, after delivering a welcome message to a user via AVSpeechUtterance. But randomly speech recognition doesn't start (after talking the welcome message) and it throws the below error message.

[avas] ERROR: AVAudioSession.mm:1049: - [AVAudioSession setActive: withOptions: error:]: deactivation of audio session with start of I / O. All I / O operations must be stopped or paused before deactivating the audio session.

It works several times. It's not clear why it doesn't work consistently.

I tried the solutions mentioned in other SO posts where it is mentioned to check if there are audio players. I added that speech check for the textual part of the code. It returns false (i.e. no other audio player works). But nevertheless, speech in the text does not begin to listen to the user's speech. Can you tell me what is going wrong.

Testing on iPhone 6 running iOS 10.3

Below are the code snippets:

TextToSpeech

- (void) speak:(NSString *) textToSpeak {
    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayback
      withOptions:AVAudioSessionCategoryOptionDuckOthers error:nil];

    [synthesizer stopSpeakingAtBoundary:AVSpeechBoundaryImmediate];

    AVSpeechUtterance* utterance = [[AVSpeechUtterance new] initWithString:textToSpeak];
    utterance.voice = [AVSpeechSynthesisVoice voiceWithLanguage:locale];
    utterance.rate = (AVSpeechUtteranceMinimumSpeechRate * 1.5 + AVSpeechUtteranceDefaultSpeechRate) / 2.5 * rate * rate;
    utterance.pitchMultiplier = 1.2;
    [synthesizer speakUtterance:utterance];
}

- (void)speechSynthesizer:(AVSpeechSynthesizer*)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance*)utterance {
    //Return success message back to caller

    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryAmbient
      withOptions: 0 error: nil];
    [[AVAudioSession sharedInstance] setActive:YES withOptions: 0 error:nil];
}

Speech to Text :

- (void) recordUserSpeech:(NSString *) lang {
    NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:lang];
    self.sfSpeechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
    [self.sfSpeechRecognizer setDelegate:self];

    NSLog(@"Step1: ");
    // Cancel the previous task if it running.
    if ( self.recognitionTask ) {
        NSLog(@"Step2: ");
        [self.recognitionTask cancel];
        self.recognitionTask = nil;
    }

    NSLog(@"Step3: ");
    [self initAudioSession];

    self.recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    NSLog(@"Step4: ");

    if (!self.audioEngine.inputNode) {
        NSLog(@"Audio engine has no input node");
    }

    if (!self.recognitionRequest) {
        NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
    }

    self.recognitionTask = [self.sfSpeechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult *result, NSError *error) {

        bool isFinal= false;

        if (error) {
            [self stopAndRelease];
            NSLog(@"In recognitionTaskWithRequest.. Error code ::: %ld, %@", (long)error.code, error.description);
            [self sendErrorWithMessage:error.localizedFailureReason andCode:error.code];
        }

        if (result) {

            [self sendResults:result.bestTranscription.formattedString];
            isFinal = result.isFinal;
        }

        if (isFinal) {
            NSLog(@"result.isFinal: ");
            [self stopAndRelease];
            //return control to caller
        }
    }];

    NSLog(@"Step5: ");

    AVAudioFormat *recordingFormat = [self.audioEngine.inputNode outputFormatForBus:0];

    [self.audioEngine.inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
        //NSLog(@"Installing Audio engine: ");
        [self.recognitionRequest appendAudioPCMBuffer:buffer];
    }];

    NSLog(@"Step6: ");

    [self.audioEngine prepare];
    NSLog(@"Step7: ");
    NSError *err;
    [self.audioEngine startAndReturnError:&err];
}
- (void) initAudioSession
{
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory:AVAudioSessionCategoryRecord error:nil];
    [audioSession setMode:AVAudioSessionModeMeasurement error:nil];
    [audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:nil];
}

-(void) stopAndRelease
{
    NSLog(@"Invoking SFSpeechRecognizer stopAndRelease: ");
    [self.audioEngine stop];
    [self.recognitionRequest endAudio];
    [self.audioEngine.inputNode removeTapOnBus:0];
    self.recognitionRequest = nil;
    [self.recognitionTask cancel];
    self.recognitionTask = nil;
}

As for the logs added, I can view all logs up to "Step7".

When debugging code on a device, it sequentially triggers breaks on the bottom lines (I have the exception breakpoints), although the continuation continues with execution. However, this occurs during several successful executions.

AVAudioFormat * recordFormat = [self.audioEngine.inputNode outputFormatForBus: 0];

[self.audioEngine prepare];

+3

ios iphone avspeechsynthesizer sfspeechrecognizer

csharpnewbie Apr 17 17 at 18:19

source to share

1 answer

Asya · Answer 1 · 2017-04-25T07:40:32+0000

The reason is that the sound did not completely complete when it was called -speechSynthesizer:didFinishSpeechUtterance:

, which is why you get an error like this trying to call setActive:NO

. You cannot deactivate AudioSession

or change any settings while I / O is running. Workaround: wait a few ms (how long to read below), and then deactivate AudioSession

and so on.

A few words about the end of audio playback.

It may seem strange at first, but I've spent a lot of time researching this problem. When you set the last chunk of audio to the device's output, you have an approximate timeline for when it will actually complete. Take a look at the AudioSession

ioBufferDuration property :

The audio I / O buffer duration is the number of seconds for one audio I / O. For example, with an I / O buffer duration of 0.005 s in each I / O cycle:

You get 0.005s of audio when you receive input.

You must provide 0.005s of audio when you provide an output.
The typical maximum I / O buffer duration is 0.93 seconds (which corresponds to 4096 frames at 44.1 kHz sampling rate). The minimum I / O buffer duration is at least 0.005 seconds (256 frames), but may be lower depending on the hardware used.

So, we can interpret this value as the playback time of one fragment. But you still have a small, unexpressed duration between this timeline and the actual completion of the audio playback (hardware delay). I would say that you need to wait about ioBufferDuration * 1000 + delay

ms for the sound to complete ( ioBufferDuration * 1000

- coz is the duration in seconds ), where delay

is a fairly small value.

It looks more like even Apple developers are also not very sure about the audio end time. A quick look at the new AVAudioPlayerNode audio class and func scheduleBuffer(_ buffer: AVAudioPCMBuffer, completionHandler: AVFoundation.AVAudioNodeCompletionHandler? = nil)

:

@param completeHandler is called after the buffer has been consumed by the player or the player has been stopped. may be nil.

@discussion Schedule the buffer to play after any previously scheduled commands. It is possible for the Handler name to be called before playback starts, or before the buffer is fully played .

You can read more about audio processing in Understanding Audio Object Audio Detection (which AudioUnit

is a low-level API that provides access to I / O data).

AVAudioSession issue when using SFSpeechRecognizer after AVSpeechUtterance

A few words about the end of audio playback.

More articles: