Using Google SR Plugin and Dynamic Speech Contexts to increase performance of Google Cloud Speech-to-text API and Dialogflow

Question

Using Google SR Plugin and Dynamic Speech Contexts to increase performance of Google Cloud Speech-to-text API and Dialogflow

722 Views Asked by Ryan Stack At 18 October 2025 at 11:43

Task: We are attempting to build a Dialogflow agent that will interact with callers via our Cisco telephony stack. We will be attempting to collect alphanumeric credentials from the caller.

Here is our proposed architecture:

Problem: In order to send text inputs to Dialogflow, we are using Google Cloud’s Speech-to-Text (STT) API to convert the caller’s audio to text. However, the STT API does not always perform as desired. For example, if a caller wishes to say his/her DOB is 04-04-90, the transcribed audio may come back as oh for oh 490. Yet, the transcribed audio can be greatly improved by passing phrase hints to the API, and so we would need to dynamically send these hints based on the scenario. Unfortunately, we're struggling to understand how we can dynamically pass these phrase hints through the UniMRCP server, specifically the Google Speech Recognition plugin.

Question: Section 5.2 of the Google Speech Recognition manual outlines using Dynamic Speech Contexts.

The example provided is:

<grammar mode="voice" root="booking" version="1.0" xml:lang="en-US" xmlns="http://www.w3.org/2001/06/grammar">
    <meta name="scope" content="hint"/>
    <rule id="booking">
        <one-of>
            <item> 04 04 1990</item>
            <item> 04 04 90</item>
            <item> April 4th 1990</item>
        </one-of>
    </rule>
</grammar>

Does this still transcribe all user input similar to how the builtin grammar builtin:speech/transcribe would behave?

For example, if I were to say March 5th 1980, would the Google’s STT return March 5th 1980, or only one of the provided items?

To be clear, I would want Google’s STT to be able to return more than just the provided items, and so if the user says March 5th 1980, I would want that returned through the UniMRCP, VBB, CVP and passed along to Dialogflow. I am being told that even if STT returned March 5th 1980 the CVP or the Voice browser would potentially evaluate it as "no match".

Original Q&A

There are 1 best solutions below

**Prisoner** · Answer 1

Prisoner On 29 January 2019 at 20:25

Dialogflow accepts more than text inputs.

It can either do intent detection based on audio or an audio stream.

Using Google SR Plugin and Dynamic Speech Contexts to increase performance of Google Cloud Speech-to-text API and Dialogflow

There are 1 best solutions below

Related Questions in DIALOGFLOW-ES

Related Questions in SPEECH-TO-TEXT

Related Questions in CISCO

Related Questions in GOOGLE-CLOUD-SPEECH

Related Questions in UNIMRCP

Trending Questions

Popular # Hahtags

Popular Questions