Amazon Polly to Alexa Skill?

Can Polly amazon be used in Alexa skill to deliver, for example, a 2-language answer in translation or other multilingual context? And if so, who has experience of using this service from a lambda function?

+3


source to share


1 answer


The short answer is no , it is currently not possible to use Polly the Amazon in the alexa skill. Polly mp3s is optional bitrate for Alexey Skills.

However, you can convert the Polly mp3 file, which gives you the correct 48kbps baud rate, and then use it in the ssml audio tag, as shown by the open source project, alexa-meet-polly.



From alexa-replies-polly:

  • The user talks to the Alexa device and asks for example. "What does" Good morning "mean in Polish?"

  • The Alexa services display the sentence for translation-intent and pass in a language slot with the meaning Polish and the word term meaning Good Morning. The AWS Lambda function contained in this repo catches the Speechlet.

  • Before starting the process of translating the text, this skill will first look into its dictionary, where all previous translations are stored. If it finds a Good Morning Polish entry in the database, it skips the entire round (step 4-9) and instantly uses the S3 sound file referenced in the Dynamo entry.

  • However, if Good Morning was not requested in Polish before the skill implementation would allow the term Good Morning to be translated into Polish using the Microsoft Translator API (or interchangeably with Google Translate).

  • The resulting translation is then passed to AWS Polly, giving it the desired VoiceId. Polly returns an MP3 stream with a colloquial term.

  • The stream is saved to AWS S3 as an MP3 file. Unfortunately, it is not yet ready to be checked back and played in Alexa due to the different audio settings required by the SSML audio in Alexa.

  • This assigns the MP3-Url to a custom service endpoint hosted on the AWS EC2 server instance. This service converts the bit rate to 48 kHz as required by Alexa using FFMPEG. Polly's voices are not as loud as Alex's, even if you raise the volume and simulate them to max. That is why this conversion also increases the volume by 10 dB.

  • The resulting MP3 overwrites its beginning in S3.

  • The URL to this file, which hasn't changed at all, is passed back to the Lambda replication caller.

  • Finally, an entry is created for Good Morning in Polish in the Dynamo dictionary. A separate entry is created for the user that links to this dictionary entry, so Alexa is referring to the last translation made for the user. This allows the user to ask Alexa to repeat the most recent translation.

  • The skill creates the output speech text and is compressed into an audio-SSML tag using an MP3 url.

  • The output speech is returned to the device. Alexa responds and reproduces the translated term with one of Polly's voices. In addition, the map returns to the Alexa app, providing translation that can be very helpful for users.

+5


source







All Articles