Conversational IVR using Cisco CVP and CVA with Google Dialogflow

Conversational IVR using Cisco CVP and CVA with Google Dialogflow

Cisco and Google have collaborated together in Customer care since 2018 and have strengthed this on April 21st, 2020 with a network bridge collaboration. Expertflow has been in the middle of this partnership, winning the first joint Cisco-Google sponsored developer award in 2018.

We have deployed in this space multiple Hybrid Chat projects with Cisco UCCE, and started deploying Virtual Assistant projects using the new Customer Voice Portal CVP 12.5 Customer Voice assistance CVA node that is available in a Cisco dcloud demo.

Cisco CVP with the CVA node and Dialogflow works very well, and Google’s Automated Speech Recognition is doing a fantastic job. Google Dialogflow NLU is a bit limited in that it only provides one intent. Conversation management is only possible with the concept of Contexts. On projects, we saw that this way of dialogue management is sufficient for very simple structured dialogues or demos, but they quickly break down when a customer goes outside of a happy path. We tried working around these limitations by storing conversation states in a database, but conversation flows tend to become too complex and difficult to manage. Google is announcing  a feature “Visual Builder” for in autumn 2020 (see Google Contact Center AI tech deep dive of April 16th) that might tackle this.

Diagram of user interacting with intents and context.

A view to our Customer Interaction Architecture gives us a hint what an alternative might be: For more complex scenarios and multiple-turn dialogues we are using alternative dialogue engine, in this case Rasa Core . The choice of the dialogue engine also depends on the vertical enviornment, languages etc, so there’s not one-size fits all. Dialogues are not explicitly hard-coded, but are based on many sample stories. These stories can learn over time to handle many different scenarios, and usually don’t require explicit programming.

We use Cisco Virtual Voice Browser as entry point, manage the conversation with Cisco CVP VXML Call Server, hand off individual sentences to Google Dialogflow for ASR and NLU detection, and then feed the recognized intents and entities to a conversation engine that is capable to handle more complex multi-turn conversations.

Alternative deployment models would be to use alternative Speech engines for languages that are not supported by Google Speech, or where cloud solutions ar not an option. There are multiple alternatives also for the NLU layer that can be finetuned to a bespoken language model.

A tweak is to set the Recognize.singleUtterance from parameter to TRUE for the rasa feature (because you would like to hand off control to Dialogflow only to recognize an utterance of the customer, revert back with recognized text, and then play it back to CVP, which can then invoke a third-party NLU.

Finally and most importantly: By allowing Expertflow CIM to capture these individual steps, data becomes accessible and transfereable between multiple AI engines, and selected agent activities can serve as tagging information to train the AI engines. AI engines will thereby learn based on real-world dialogues.