Dialogflow ES / CX

Dialogflow provides a platform of natural language understanding which enables smooth conversational experiences by simple integration into a bot or any application in a system. It can analyze the incoming text from the end-users and respond to text as an output.

Dialogflow is available in two editions: Dialogflow CX (advanced) and Dialogflow ES (standard). Both editions are integrated into CVG.

Project Setup

To build voice bots using Dialogflow and CVG, you need accounts in both the platforms respectively.

Dialogflow

To use Dialogflow, an additional service account creation is mandatory which creates the authentication key.

Follow the below links to start the integration process:

After understanding the basics of Dialogflow using the information provided in Dialogflow documentation, login to the Dialogflow console using your Google account. Now you need to create an agent in Dialogflow providing the agent name and setting preferred values in the following fields. Initially, the agent is provided with two default intents, the default welcome intent and default fallback intent. Additional training phrases can be added to these intents to cover possible input phrases users can use for the respective intent cases. Based on the scenario under focus, any number of intents can be added covering any number of input phrases from the user.

Create the service account key using the link provided above. Make sure to assign at least the role Dialogflow Service Agent in order to allow intent detection. Once the account is created, create a key for it in Google’s JSON format and download that key.

CVG

If you do need an account in CVG please contact support@vier.ai.

To set up the Dialogflow project in CVG, create a project in CVG by filling the fields in each section. In the bot configuration section:

  • Select either “Dialogflow ES” or “Dialogflow CX” as the template.

  • Provide the required fields for your specific dialogflow project as provided by Google (The location and agent ID for CX, the environment and user ID for ES).

  • Either paste the contents of the previously created service account key or select the JSON file from your local filesystem.

Communication

From CVG to Dialogflow (Events)

Normal spoken inputs from the user will be transmitted as text inputs to dialogflow. In CX in addition to the input, CVG will create a session parameter called utterance, in ES a short-lived input context will be added that is also called utterance. With the session parameter and the context have the following content:

{
  "language": "de-DE",
  "confidence": 91
}

DTMF tones will be transmitted as DTMF inputs to Dialogflow CX. Dialogflow ES does not explicitly support DTMF inputs, so CVG sends them as text inputs, but attaches a short-lived context named dtmf without any additional data.

All other voice events from CVG are transmitted to Dialogflow as event inputs. In Dialogflow ES, the events will carry a payload. Dialogflow CX does not have event payloads, instead CVG creates session parameters with the same name as the event. Note that due to the nature of CX’ session parameters, these values will permanently remain in the session. New events will overwrite the parameters of previous events.

All event payloads contain at least two fields:

  • dialogId: This is the globally unique id assigned to the dialog by CVG.

  • projectContext: This object containers the resellerToken and projectToken which can be required for certain API calls.

  • timestamp: This is the point in time (unix timestamp in milliseconds) the event occurred in CVG.

Here is a list of the available events from CVG:

  • greeting: This event is sent once before anything else to allow the bot to respond e.g. with a greeting. It contains additional data, for example:

    {
      "dialogId": "09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp": 1535546718115,
      "local": "+49123412341234",
      "remote": "+49567856785678",
      "language": "de-DE",
      "customSipHeaders": {
        "X-SomeCustomHeader": ["value"]
      }
    }
    
  • termination: Signals that the conversation has been terminated by the user. It contains additional data, for example:

    {
      "dialogId": "09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp": 1535546718115,
      "reason": "botDisconnected"
    }
    
  • inactive: Signals that the inactivity timeout has been triggered due to a lack of user input. It contains additional data, for example:

    {
      "dialogId": "09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp": 1535546718115,
      "duration": 5000
    }
    
  • recording: Signals a change in the recording status, for example:

    {
      "dialogId": "09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp": 1535546718115,
      "status": "Available",
      "recordingId": "string"
    }
    
  • answer-number: The result of a prompt (see next section) with type Number, for example:

    {
      "dialogId": "09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp": 1535546718115,
      "confidence": 100,
      "language": "en-US",
      "type": {
        "name": "Number",
        "value": "8342"
      }
    }
    
  • answer-multiple-choice: The result of a prompt (see next section) with type MultipleChoice, for example:

    {
      "dialogId": "09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp": 1535546718115,
      "confidence": 83,
      "language": "en-US",
      "type": {
        "name": "MultipleChoice",
        "id": "no",
        "synonym": "never"
      }
    }
    
  • answer-timeout: The result of any prompt (see next section) that did not receive an answer within its specified timeout, for example:

    {
      "dialogId": "09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp": 1535546718115,
      "confidence": 100,
      "language": "en-US",
      "type": {
        "name": "Timeout"
      }
    }
    
  • outbound-success: The success result of forward or bridge (see next section). It signals that the outgoing call has been successfully established. An example:

    {
      "dialogId":"09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp":1535546718115,
      "ringTime": 12545,
      "ringStartTimestamp": 1535546718225
    }
    
  • outbound-failure: The failure result of forward or bridge (see next section). It signals that the outgoing call could not be established and provides some details as to why. An example:

    {
      "dialogId":"09e59647-5c77-4c02-a1c5-7fb2b47060f1",
      "projectContext": {
        "resellerToken": "ed4aff6d-c6f8-4ac9-ab67-d072ef45d9a0",
        "projectToken": "d30b1c38-b2fd-49c8-bec2-b268871338b0"
      },
      "timestamp":1535546718115,
      "ringTime": 12545,
      "ringStartTimestamp": 1535546718225,
      "reason": "RING_TIMED_OUT"
    }
    

    Depending on the exact reason (check out the OutboundCallFailure model in the API specification for all possible reasons) there might not be a ringStartTimestamp and the ringTime could be zero.

From Dialogflow to CVG (Commands)

Dialogflow can send messages to CVG using responses (Dialogflow ES) or fulfillments (Dialogflow CX), both called response from here onwards.

A response can be either text, or a custom payload (an arbitrary JSON object). Each intent can also have multiple responses, freely mixing text and custom payloads. Each text response can supply multiple different messages, Dialogflow will choose one of them at random. Custom payloads allow providing additional information to CVG, that cannot be included in the text, for example the language and if the message can be interrupted by the caller. The following sections describe the supported custom payloads in detail.

Say

Say can be used for messages that need some customization.

Options
  • message (required): The message to be said.

  • language (optional): This allows to override the synthesizer language for specific messages. (string, e.g. “de-DE”, defaults to the project language)

  • synthesizers (optional): If specified, this parameter overrides the synthesizer list (resp. voices) from the project settings. To specify a voice use vendor name like “GOOGLE” and attach the synthesizer profile name as a suffix separated by a dash, e.g. “GOOGLE-en-US-Wavenet-H”. Alternatively, the profile token can be used directly (without the vendor name). The first synthesizer in the list has the highest priority. Additional synthesizers are used as a fallback (in order) in case a service is currently unreachable.

  • interpretAs (optional): Explicitly states what the given text should be interpreted as. If omitted, CVG tries to detect SSML and otherwise assumes plain text. Use TEXT if the text sent by your bot might contain XML-like text that could lead to a false SSML detection. Use SSML if the text sent by your bot should always be interpreted as SSML, even if it does not start with a <speak> tag.

  • bargeIn (optional): Allows the message to be interrupted by the speaker. (boolean, default false)

Examples
{
  "status": "say",
  "message": "Hi there!",
  "bargeIn": true,
  "language": "de-CH",
  "synthesizers": ["MICROSOFT-de-CH-LeniNeural"],
  "language": "en-CA"
}

Termination

This payload will hang up the call after all other messages have been synthesized. bargeIn-enabled messages will be interrupted by the termination. If messages are without bargeIn termination will happen after the end of the speech output.

Options

No options available.

Examples
{
  "status": "termination"
}

Forward

This payload allows to forward a call to an external phone number (restricted to specific countries; ask us if you want to forward to a country currently not enabled).

If the outbound call could not be established, the bot will receive a outbound-failure event (see previous section). Otherwise the bot will receive a outbound-success event as soon as the call is fully established.

Options
  • destinationNumber (required): The phone number to forward to. (+E.164 format, e.g. “+49721480848680”)

  • callerId (optional): The phone number displayed to the callee. (This is a best-effort option, correct display can not be guaranteed)

  • customSipHeaders (optional): An object where each property is the name of a header, and the value is a list of strings. All header names must begin with X-. For example:

    {
      "X-SomeHeader": ["some value", "another value"]
    }
    
  • ringTimeout (optional): The maximum time the call will be ringing (in milliseconds) before the attempt will be cancelled. By default, this is 120 seconds.

  • acceptAnsweringMachines (optional): Whether the bot should accept answering machines picking up. Answering machine detection is a best-effort functionality and bots should not rely on an exact detection. It also cuts off up to 5 seconds of the beginning of the call for detection purposes.

  • data (optional): An object with key-value pairs to be attached as custom data to the dialog.

  • experimentalEnableRingingTone (optional, experimental): Enables the playback of a ringing tone while the call is pending. This option will change in the future.

Examples
{
  "status": "forward",
  "destinationNumber": "+49721480848680",
  "callerId": "+49721480848680"
}

Bridge

This payload allows to bridge a call to an external phone number (restricted to specific countries; ask us if you want to forward to a country currently not enabled) for the Assist Use-Case.

If the outbound call could not be established, the bot will receive a outbound-failure event (see previous section). Otherwise the bot will receive a outbound-success event as soon as the call is fully established.

Options
  • headNumber (required): The phone number prefix to bridge to. (+E.164 format, e.g. “+49721480848680”)

  • extensionLength (required): The range of extensions to choose a number from.

  • callerId (optional): The phone number displayed to the callee. (This is a best-effort option, correct display can not be guaranteed)

  • customSipHeaders (optional): An object where each property is the name of a header, and the value is a list of strings. All header names must begin with X-. For example:

    {
      "X-SomeHeader": ["some value", "another value"]
    }
    
  • ringTimeout (optional): The maximum time the call will be ringing (in milliseconds) before the attempt will be cancelled. By default, this is 120 seconds.

  • acceptAnsweringMachines (optional): Whether the bot should accept answering machines picking up. Answering machine detection is a best-effort functionality and bots should not rely on an exact detection. It also cuts off up to 5 seconds of the beginning of the call for detection purposes.

  • data (optional): An object with key-value pairs to be attached as custom data to the dialog.

  • experimentalEnableRingingTone (optional, experimental): Enables the playback of a ringing tone while the call is pending. This option will change in the future.

Examples
{
  "status": "bridge",
  "headNumber": "+49721480848680",
  "extensionLength": 3
}

Play

This payload can be used to play audio files to be heard by the caller.

Note the following requirements and limitations:

  • The audio file must be hosted at an Internet-accessible HTTP(S) endpoint. In case of HTTPS the server hosting the audio file must present a valid, trusted SSL certificate. Self-signed certificates cannot be used.

  • The audio file must be a valid wav file (waveform audio file format).

  • The file format must be one of the following:

    • Linear PCM with signed 16 bits per sample, with a sample rate of 8000 Hz or 16000 Hz

    • A-law with a sample rate of 8000 Hz

    • µ-law with a sample rate of 8000 Hz

Options
  • url (required): The location of the audio file.

  • bargeIn (optional): Allows the message to be interrupted by the speaker. (boolean, default false)

Examples
{
  "status": "play",
  "url": "https://example.org/some-audio.wav",
  "bargeIn": true
}

Recording Start

This payload can be used to start the recording.

Options
  • maxDuration (optional): Maximum recording duration in milliseconds. After the duration, the recording will be stopped automatically.

  • recordingId (optional): An arbitrary string to identify the recording in case multiple recordings are created in the same dialog.

  • speakers (optional): A list of audio channels to record. Possible values are CUSTOMER and AGENT.

Examples
{
  "status": "recording-start",
  "maxDuration": 20000,
  "recordingId": "string",
  "speakers": [
    "CUSTOMER",
    "AGENT"
  ]
}

Recording Stop

This payload can be used to stop the recording.

Options
  • recordingId (optional): An arbitrary string to identify the recording in case multiple recordings are created in the same dialog.

  • terminate (optional): Whether the recording should be terminated, rather than just paused. If terminated, the recording will be processed as soon as possible instead of deferring processing until the dialog has ended.

Examples
{
  "status": "recording-stop",
  "recordingId": "string",
  "terminate": false
}

Data

This payload can be used to attach custom data to the dialog.

Options
  • data (required): This is an object that can have arbitrary properties, each property is expected to have a string value.

Examples
{
  "status": "data",
  "data": {
    "UsedLanguage": "language=$session.params.greeting.language",
    "CallerNumber": "$session.params.greeting.remote"
  }
}

Debug

This payload can be used to log bot state to CVG for debugging purposes.

Options
  • details (required): The information to be logged as arbitrary JSON.

Examples
{
  "status": "debug",
  "details": {
    "same-field": 123.4
  }
}

Prompt

This payload allows to start various prompts.

Options
  • message (required): The message to introduce the prompt to the caller.

  • language (optional): This allows to override the synthesizer language for specific messages. (string, e.g. “de-DE”, defaults to the project language)

  • bargeIn (optional): Allows the message to be interrupted by the speaker. (boolean, default false)

  • timeout (required): The duration (in milliseconds) after which the prompt will be cancelled.

  • type (required): The type of prompt and all its details in a nested object. Please consult the Prompt API specifications

Examples
Number Prompt

Here is another sample of a custom payload, that requests CVG to collect max 4 digits via DTMF input using # as DTMF signal to terminate input collection (number prompt).

{ 
  "status": "prompt",
  "message": "The prompt message for the user",
  "language": "en-US",
  "bargeIn": false,
  "timeout": 5000,
  "type": {
    "name": "Number",
    "maxDigits": 4,
    "submitInputs": ["DTMF_#"]
  }
} 

Some things to note:

  • There has to be at least one stop condition (so either maxDigits or submitInputs must be specified).

  • The caller response to such a prompt request will be a message with the answer-number input event (see previous section).

Multiple Choice Prompt

Besides number prompts, there are also multiple choice prompts. In the following example, the two available “yes” and “no” choices can each be triggered by one of the provided synonyms. The synonyms also include DTMF digits, in case the user prefers to simply press ‘0’ or ‘1’.

{
  "status": "prompt",
  "message": "The prompt message for the user",
  "timeout": 5000,
  "type": {
    "name": "MultipleChoice",
    "choices": {
      "yes": [
        "yes",
        "yeah",
        "affirmative",
        "DTMF_1"
      ],
      "no": [
        "no",
        "never",
        "negative",
        "DTMF_0"
      ]
    }
  }
}

Some things to note:

  • The caller response to such a prompt request will be a message with the answer-multiple-choice input event (see previous section).