Supporting multi-modal input in an Alexa skill with both voice dialogs and on-screen APL components

89 Views Asked by At

I am working on an Alexa skill where we are setting up a user's profile for future interactions and notifications. We have successfully built a purely voice-driven skill utilizing the proper intents and slots.

I wanted to expand this skill for our hard-of-hearing users using Alexa Presentation Language with on-screen touch widgets that a user could tap to provide their answer instead of speaking it. I've run into problems with our first input, which is a time picker, expressed as an APL AlexaTextList component. When the user gets to the time question, Alexa properly displays the interface, and the user may select an option, which is successfully transmitted to our server using the SendEvent APL command. (We are hosting the Alexa endpoint ourselves on a Django application server using the Python ASK SDK, and NOT using Lambdas.)

After a LENGTHY search, I was unable to find any command or method that would allow the APL document to fill the appropriate slot in the original dialog intent.

On the server side of things, I ended up saving the responses in the skill's session variables for use later. After saving those values, I attempted to return a response to the original UserEvent request with an ElicitSlot directive to ask the next question in the dialog sequence. I received an INVALID_RESPONSE from Alexa, with the error

Directive "Dialog.ElicitSlot" is allowed only when an intent is being processed

I ran into a similar issue when I replaced the ElicitSlot directive with a Delegate directive attempting to restart the dialog intent:

Directive "Dialog.Delegate" cannot be used in response to an event

Questions:

  1. From a high level, am I correct in believing (as I do now) that the interaction directions between dialog intents and APL documents is purely one-way? Intents can include directives that influence the content and presentation of the APL documents, but actions within the APL document cannot influence the intent that is rendering it?

  2. Is there any method from the server, in the context of servicing a UserEvent request, that the response can resume the dialog intent that was interrupted by the event and continue?

  3. Is there method, in the context of servicing a UserEvent request, that the response can restart or launch a specified dialog intent without relying the user to utter one of its sample launch utterances?

Thank you in advance - I feel like I'm missing something blindingly obvious or this is a massive functional gap in the Alexa platform.

1

There are 1 best solutions below

0
On

In response to a UserEvent request, you can include any custom skill directives except for Dialog.Delegate. Currently there are no methods that can be used in a UserEvent response handler to resume or restart an intent's dialog delegation flow.

Instead, you can save the values from touch events as session attributes as you are doing already, and have Alexa respond with output speech asking the user to confirm their selection and an APL RenderDocument directive (for devices that support APL) with text prompting the user to verbally confirm their selection. You can handle yes/no verbal responses with the built-in intents AMAZON.YesIntent and AMAZON.NoIntent. These intent handlers can then respond utilizing the Dialog interface.