Google Speech API returning empty Json response

I want to use the Google Speech V1 API with Python.

So far I've had to work with the google uri example and get the content back. When I tried to change the code to use a custom recorded audio file, I get a response from google, but it doesn't have any translated content.

I have set the request:

"""Transcribe the given raw audio file asynchronously.
Args:
    audio_file: the raw audio file.
"""
audio_file = 'audioFiles/test.raw'

with open(audio_file, 'rb') as speech:
    speech_content = base64.b64encode(speech.read())

service = get_speech_service()
service_request = service.speech().asyncrecognize(
    body={
        'config': {
            'encoding': 'LINEAR16',
            'sampleRate': 16000, 
            'languageCode': 'en-US',
        },
        'audio': {
            'content': speech_content.decode('utf-8', 'ignore')
            }
        })
response = service_request.execute()

print(json.dumps(response))

name = response['name']

service = get_speech_service()
service_request = service.operations().get(name=name)

while True:
    # Get the long running operation with response.
    response = service_request.execute()

    if 'done' in response and response['done']:
        break
    else:
        # Give the server a few seconds to process.
        print('%s, waiting for results from job, %s' % (datetime.now().replace(second=0, microsecond=0), name))
        time.sleep(60)

print(json.dumps(response))

      

which gives me the answer:

kayl@kayl-Surface-Pro-3:~/audioConversion$ python speechToText.py 
{"name": "527788331906219767"} 2017-03-30 20:10:00, waiting for results from job, 527788331906219767
{"response": {"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"},"done": true, "name": "527788331906219767", "metadata": {"lastUpdateTime": "2017-03-31T03:11:16.391628Z", "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "startTime": "2017-03-31T03:10:52.351004Z", "progressPercent": 100}}

      

Where should I get the answer, which is in the form:

{"response": {"@type":"type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse", "results":{...}}...

      

Using the original audio file:

  • Sample rate 16000Hz, tried 41000hz as well
  • 16 bit Little Endian
  • Signature
  • 65 seconds

To record this sound, I run:

arecord -f cd -d 65 -r 16000 -t raw test.raw

      

Any advice that could point me in the right direction would be greatly appreciated.

+3


source to share


1 answer


Your example is basically the same as this example which works for me from test audio files .

Does your code work for you with a test sample audio.raw

,? If so, it is most likely an encoding issue. I have had the most success with flac files and recorded audio as recommended in best practices . I've also used Audacity in the past to get some of the guesswork out of the recording.

On Mac OSX, the following shell script worked to get 65 seconds of audio:

  rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 65

      



Then I use the following code to transcribe the audio:

from google.cloud import speech
speech_client = speech.Client()

with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio_sample = speech_client.sample(
        content,
        source_uri=None,
        encoding='LINEAR16',
        sample_rate=44100)

operation = speech_client.speech_api.async_recognize(audio_sample)

retry_count = 100
while retry_count > 0 and not operation.complete:
    retry_count -= 1
    time.sleep(2)
    operation.poll()

if not operation.complete:
    print('Operation not complete and retry limit reached.')
    return

alternatives = operation.results
for alternative in alternatives:
    print('Transcript: {}'.format(alternative.transcript))

      

Please note that in my example I am using a new client library that makes API access easier. This code example is the starting point from where I got my example.

+2


source







All Articles