Speech-to-Text and Text-to-Speech Services๏ƒ

CVG makes quite a few Speech-to-Text (STT) services and Text-To-Speech (TTS) services with many voices available for your voice applications in many languages.

Supported Languages๏ƒ

We support 61 languages and dialects:

  • Arabic (๐Ÿ‡ช๐Ÿ‡ฌ Egypt)

  • Arabic (๐Ÿ‡ธ๐Ÿ‡ฆ Saudi Arabia)

  • Bengali (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • Bulgarian (๐Ÿ‡ง๐Ÿ‡ฌ Bulgaria)

  • Catalan (๐Ÿ‡ช๐Ÿ‡ธ Spain)

  • Chinese, Cantonese (Traditional, ๐Ÿ‡ญ๐Ÿ‡ฐ Hong Kong)

  • Chinese, Mandarin (Simplified, ๐Ÿ‡จ๐Ÿ‡ณ China)

  • Chinese, Mandarin (Simplified, ๐Ÿ‡ญ๐Ÿ‡ฐ Hong Kong)

  • Chinese, Mandarin (Traditional, ๐Ÿ‡น๐Ÿ‡ผ Taiwan)

  • Croatian (๐Ÿ‡ญ๐Ÿ‡ท Croatia)

  • Czech (๐Ÿ‡จ๐Ÿ‡ฟ Czech Republic)

  • Danish (๐Ÿ‡ฉ๐Ÿ‡ฐ Denmark)

  • Dutch (๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands)

  • English (๐Ÿ‡ฆ๐Ÿ‡บ Australia)

  • English (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • English (๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom)

  • English (๐Ÿ‡บ๐Ÿ‡ธ United States)

  • Estonian (๐Ÿ‡ช๐Ÿ‡ช Estonia)

  • Filipino (๐Ÿ‡ต๐Ÿ‡ญ Philippines)

  • Finnish (๐Ÿ‡ซ๐Ÿ‡ฎ Finland)

  • French (๐Ÿ‡จ๐Ÿ‡ฆ Canada)

  • French (๐Ÿ‡ซ๐Ÿ‡ท France)

  • French (๐Ÿ‡จ๐Ÿ‡ญ Switzerland)

  • German (๐Ÿ‡ฆ๐Ÿ‡น Austria)

  • German (๐Ÿ‡ฉ๐Ÿ‡ช Germany)

  • German (๐Ÿ‡จ๐Ÿ‡ญ Switzerland)

  • Greek (๐Ÿ‡ฌ๐Ÿ‡ท Greece)

  • Gujarati (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • Hebrew (๐Ÿ‡ฎ๐Ÿ‡ฑ Israel)

  • Hidni (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • Hungarian (๐Ÿ‡ญ๐Ÿ‡บ Hungary)

  • Icelandic (๐Ÿ‡ฎ๐Ÿ‡จ Iceland)

  • Indonesian (๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesia)

  • Italian (๐Ÿ‡ฎ๐Ÿ‡น Italy)

  • Japanese (๐Ÿ‡ฏ๐Ÿ‡ต Japan)

  • Kannada (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • Korean (๐Ÿ‡ฐ๐Ÿ‡ท South Korea)

  • Latvian (๐Ÿ‡ฑ๐Ÿ‡ป Latvia)

  • Lithuanian (๐Ÿ‡ฑ๐Ÿ‡น Lithuania)

  • Malay (๐Ÿ‡ฒ๐Ÿ‡พ Malaysia)

  • Malayalam (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • Norwegian Bokmรฅl (๐Ÿ‡ณ๐Ÿ‡ด Norway)

  • Norwegian Nynorsk (๐Ÿ‡ณ๐Ÿ‡ด Norway)

  • Polish (๐Ÿ‡ต๐Ÿ‡ฑ Poland)

  • Portuguese (๐Ÿ‡ง๐Ÿ‡ท Brazil)

  • Portuguese (๐Ÿ‡ต๐Ÿ‡น Portugal)

  • Romanian (๐Ÿ‡ท๐Ÿ‡ด Romania)

  • Russian (๐Ÿ‡ท๐Ÿ‡บ Russia)

  • Serbian (๐Ÿ‡ท๐Ÿ‡ธ Serbia)

  • Slovak (๐Ÿ‡ธ๐Ÿ‡ฐ Slovakia)

  • Slovenian (๐Ÿ‡ธ๐Ÿ‡ฎ Slovenia)

  • Spansih (๐Ÿ‡ฒ๐Ÿ‡ฝ Mexico)

  • Spanish (๐Ÿ‡ช๐Ÿ‡ธ Spain)

  • Spanish (๐Ÿ‡บ๐Ÿ‡ธ United States)

  • Swedish (๐Ÿ‡ธ๐Ÿ‡ช Sweden)

  • Tamil (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • Telugu (๐Ÿ‡ฎ๐Ÿ‡ณ India)

  • Thai (๐Ÿ‡น๐Ÿ‡ญ Thailand)

  • Turkish (๐Ÿ‡น๐Ÿ‡ท Turkey)

  • Ukrainian (๐Ÿ‡บ๐Ÿ‡ฆ Ukraine)

  • Vietnamese (๐Ÿ‡ป๐Ÿ‡ณ Vietnam)

You do not find the language you need? Get in contact with us to make us believe and change!

Speech-to-Text Services๏ƒ

For most projects we recommend using Microsoft, Google, or IBM Speech-to-Text. All three support many languages and allow to advanced features such as punctuation and profanity filtering. Whisper by OpenAI is a new player that does also a great job. EML is hosted in our secure VIER environment with best results for German. Other Speech-to-Text engines are available on special request.

Select one of these Speech-to-Text engines for your voice application by selecting it in the CVG console for your CVG project.

Microsoft๏ƒ

Microsoft Speech-to-Text uses deep neural networks to transcribe spoken words into text with high accuracy. It supports multiple languages and can handle noisy environments. The service can also be customized to recognize specific vocabulary and language models.

Microsoft Speech-to-Text can be used for all available languages supported by Microsoft.

CVG supports the configuration of a customized Microsoft Speech-to-Text endpoint. This enables our customers to use customized speech models for improved recognition performance. By providing hints, the transcription accuracy of domain specific words (e.g. product names) or phrases can be boosted.

Our default Microsoft Speech-to-Text services are hosted in the Azure Cloud region โ€œwesteuropeโ€ for our European customers.

Google๏ƒ

Google Speech-to-Text uses advanced machine learning algorithms to accurately transcribe speech in real-time.

Google Speech-to-Text can be used for all available languages supported by Google.

Google Speech-to-Text is hosted by Google in the Google Cloud Platform (GCP). For our European customers our default Google Spech-to-text service is using EU regional API endpoints.

CVG provides two alternatives of Google Speech-to-Text: Google Default Speech-to-Text and Google Streaming Speech-to-Text

Google Default Speech-to-Text๏ƒ

The default Google Speech-to-Text (provided as โ€œGoogle (Default)โ€ in CVG Console) means that CVG itself recognizes the beginning and end of an utterance in the audio and sends this utterance only as an audio snippet to Google for transcription. So here we are more flexible, e.g. when defining the end of an utterance.

Google Streaming Speech-to-Text๏ƒ

With Google Streaming Speech Recognition, CVG streams the callerโ€™s real-time audio to Google throughout the duration of the call. This is also referred to as โ€œendless streaming transcription.โ€ Speech-to-Text results are provided by Google as the audio is processed.

IBM๏ƒ

IBM offers a Speech-to-Text service as part of their Watson platform. This service uses machine learning algorithms to transcribe spoken words into written text in real-time. The service can also be customized to recognize specific vocabulary and language models.

IBM Speech-to-Text can be used for all available languages supported by IBM.

CVG supports the configuration of a customized IBM Speech-to-Text endpoint. This enables our customers to use customized speech models for improved recognition performance.

Our default IBM Speech-to-Text services are hosted in a Europen region of the Watson platform for our European customers.

OpenAI Whisper๏ƒ

OpenAI has a good STT service available for many languages. Currently we offer OpenAI Whisper hosted by OpenAI in the US.

EML๏ƒ

We host EML in our secure VIER environment. Therefore itโ€™S a good alternative if you do not want to transfer your users speech to an US hyperscaler even if the provide their service within the EU.

Text-to-Speech Voices๏ƒ

CVG supports several hundreds of standard and neural Text-to-Speech (TTS) Voices. The improvements in speech quality of neural voices come through a new machine learning approach which converts text into lifelike speech.

Select one of these voices in your voice application by selecting it in the CVG console for your CVG project.

The voice of your choice is used in a call when your applications uses /call/say (spec) or /call/prompt (spec) endpoints.

In case you plan to use SSML in your application, keep in mind that SSML support can vary wildly between the various vendors and sometimes even between voices. Make sure you check out the SSML documentation specific to the vendors your choose, especially if you plan to use a different vendor as a fallback.

Amazon๏ƒ

All voices made available by Amazon in the Amazon cloud (AWS) can be used in CVG. This includes standard voices as well as neural voices.

Find a list of Amazon TTS voices here.

Contact us if you want to use but canโ€™t find one of these voices in your CVG console.

SSML Support

Google๏ƒ

All voices made available by Google in the Google cloud can be used in CVG. This includes standard voices as well as wavenet voices (neural voices).

Find a list of Google TTS voices here.

Contact us if you want to use but canโ€™t find one of these voices in your CVG console.

SSML Support

IBM๏ƒ

All voices made available by IBM in the IBM cloud can be used in CVG. This includes standard voices as well as v3 voices (neural voices).

Find a list of IBM voices here.

Contact us if you want to use but canโ€™t find one of these voices in your CVG console.

SSML Support

OpenAI๏ƒ

All voices made available by OpenAI in the OpenAI cloud in the US cloud can be used in CVG.

Find a list of OpenAI voices here.

Note that OpenAI TTS is not available in the EU and does not support SSML yet.

Microsoft๏ƒ

All voices made available by Microsoft in the Microsoft cloud (Azure) can be used in CVG. This includes standard voices as well as neural voices.

Find a list of Microsoft voices here.

Contact us if you want to use but canโ€™t find one of these voices in your CVG console.

SSML Support

Nuance๏ƒ

From Nuance the following neural voices are available in CVG

  • US-English (en-US): Zoe

  • German (de-DE): Petra

We host Nuance in our cloud, i.e. a Germany-based datacenter.

Please ask us if you need another voice from Nuance.

SSML Support