Google Speech API returning empty Json response
I want to use the Google Speech V1 API with Python.
So far I've had to work with the google uri example and get the content back. When I tried to change the code to use a custom recorded audio file, I get a response from google, but it doesn't have any translated content.
I have set the request:
"""Transcribe the given raw audio file asynchronously.
Args:
audio_file: the raw audio file.
"""
audio_file = 'audioFiles/test.raw'
with open(audio_file, 'rb') as speech:
speech_content = base64.b64encode(speech.read())
service = get_speech_service()
service_request = service.speech().asyncrecognize(
body={
'config': {
'encoding': 'LINEAR16',
'sampleRate': 16000,
'languageCode': 'en-US',
},
'audio': {
'content': speech_content.decode('utf-8', 'ignore')
}
})
response = service_request.execute()
print(json.dumps(response))
name = response['name']
service = get_speech_service()
service_request = service.operations().get(name=name)
while True:
# Get the long running operation with response.
response = service_request.execute()
if 'done' in response and response['done']:
break
else:
# Give the server a few seconds to process.
print('%s, waiting for results from job, %s' % (datetime.now().replace(second=0, microsecond=0), name))
time.sleep(60)
print(json.dumps(response))
which gives me the answer:
kayl@kayl-Surface-Pro-3:~/audioConversion$ python speechToText.py
{"name": "527788331906219767"} 2017-03-30 20:10:00, waiting for results from job, 527788331906219767
{"response": {"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"},"done": true, "name": "527788331906219767", "metadata": {"lastUpdateTime": "2017-03-31T03:11:16.391628Z", "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "startTime": "2017-03-31T03:10:52.351004Z", "progressPercent": 100}}
Where should I get the answer, which is in the form:
{"response": {"@type":"type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse", "results":{...}}...
Using the original audio file:
- Sample rate 16000Hz, tried 41000hz as well
- 16 bit Little Endian
- Signature
- 65 seconds
To record this sound, I run:
arecord -f cd -d 65 -r 16000 -t raw test.raw
Any advice that could point me in the right direction would be greatly appreciated.
source to share
Your example is basically the same as this example which works for me from test audio files .
Does your code work for you with a test sample audio.raw
,? If so, it is most likely an encoding issue. I have had the most success with flac files and recorded audio as recommended in best practices . I've also used Audacity in the past to get some of the guesswork out of the recording.
On Mac OSX, the following shell script worked to get 65 seconds of audio:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 65
Then I use the following code to transcribe the audio:
from google.cloud import speech
speech_client = speech.Client()
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
operation = speech_client.speech_api.async_recognize(audio_sample)
retry_count = 100
while retry_count > 0 and not operation.complete:
retry_count -= 1
time.sleep(2)
operation.poll()
if not operation.complete:
print('Operation not complete and retry limit reached.')
return
alternatives = operation.results
for alternative in alternatives:
print('Transcript: {}'.format(alternative.transcript))
Please note that in my example I am using a new client library that makes API access easier. This code example is the starting point from where I got my example.
source to share