TAPI and SAPI Speech

Microsoft's Telephony API (TAPI) provides developers with a standardized interface to a rich selection telephony hardware. By utilizing TAPI, developers can write applications that support any device with a TAPI driver, also called a Telephony Service Provider (TSP). TAPI eliminates the need for developers to wrestle with device specific APIs and enables well-behaved device sharing between TAPI applications. Unfortunately TAPI is very complex and does not include direct support for useful speech technologies like text-to-speech (TTS) and speech recognition (SR).  There is an entirely different specification to address these needs called the Microsoft Speech API (SAPI).  We put both of these technologies together for you.

ExceleTel TeleTools makes adding TAPI telephony functionality to your applications quick and easy.  In the folowing pages we've been able to put together a fairly comprehensive guide to incorporating speech into your application using Microsoft's SAPI Speech SDK 5.1.  In addition, we are developing of sample programs in various development environments to help show you exactly how to do it yourself.

To create an application the utilizes ExceleTel TeleTools (TAPI) and also uses SAPI you can write directly to SAPI or utilize a set of tools.  Listed below you will find links to other sources to learn more.

NOTE : Not all telephony devices support Microsoft's SAPI.  Please contact your telephony device provider for more information.  In addition, ExceleTel cannot provide free support for Microsoft's SAPI.  We provide the information and sample found here purely as a resource to help our clients in their search to build better applications.

In addition, Microsoft has released SAPI 5.2, but only with Windows 2003 server.  There are no plans by them to release this in a standalone version. Our tests indicate that the samples work fine on machines with SAPI 5.2. However, SAPI 5.2 is designed to support the Microsoft Speech Server and Microsoft Speech Application SDK only - there isn't currently a SAPI 5.2 SDK to support development of standalone applications.  

 

This guide will cover the following topics:

  • How the SAPI Speech SDK works
  • How to install everything you need in several development environments
  • How to use SAPI speech with TeleTools to create speech enabled programs
  • How to simplify using wave files in various formats in your application
  • How our events, methods and properties make it easy for you

What are some of the things you can do with TeleTools, TAPI and SAPI together?  Well, you can create applications like call centers, CTI, Interactive Voice Response (IVR), messaging, call answering, voicemail and much more.

Text-To-Speech (TTS)

Text to speech technology works by breaking up all the sounds of the spoken language into  "phenomes", or the basic sounds that make up all human speech like "eh", "ah", "oh", "ow", and many others and putting  them together to form words.  A  lexicon inside a speech engine, such as the one that Microsoft includes in their free SAPI SDK, knows how to take words and pronounce them over a device with sound capability.  It uses the standard Windows multimedia sound capabilities.  If a word is not in the list or doesn't follow a standard rule of pronunciation, SAPI incorporates XML (Extensible Markup Language) so that you can use HTML like tags to tell it how to pronounce anything you like.  You can also use XML to control spoken volume, speed and a number of other factors all on the fly.  More on this later.

The code to do Text-To-Speech can be as simple as this:

VB...

etRecord.DeviceID = etLine1.WavePlayID   'use TeleTools to get output device
SpMMAudioOut1.DeviceID = etRecord1.DeviceID 
SpVoice1.AllowAudioOutputFormatChangesOnNextSet = False
SpVoice1.AudioOutputStream.Format.Type = SAFT8kHz16BitMono
Set SpVoice1.AudioOutputStream = SpMMAudioOut1
SpVoice1.Speak "This text speaks", SVSFlagsAsync

Delphi..

etRecord1.Device.ID := etLine1.WavePlay.ID; //TeleTools sets up your audio
SpMMAudioOut1.DeviceId := etRecord1.Device.ID;
SpVoice1.AllowAudioOutputFormatChangesOnNextSet := False;
SpVoice1.AudioOutputStream.Format.type_ := SAFT8kHz16BitMono);
Spvoice1.AudioOutputStream := SpMMAudioOut1.DefaultInterface;
SpVoice1.Speak("This text speaks", SVSFlagsAsync).

If you are ready to jump in without any further tutorial, here are links to our sample programs and to other sources on the web for information and tools.  If you are more interested in finding out how it all fits together exactly how you can build your own application then please follow along with this guide by clicking HERE or on the link at the bottom of the page. 

ExceleTel TTS sample programs

Description
etTTSSimple A simple Text-To-Speech sample that shows how to play speech over a phone line.  etTTSSimple allows you to select a TAPI line device, choose a wave file format and play speech over a phone line during a connected call or over your sound card.
etTextToSpeech A much more involved sample than etTTSSimple that in addition to the above, shows how to change voices, speed, and inflection with Extensible Markup Language (XML). You can start, stop, pause, and use the tokens feature of SAPI.  Place outgoing calls or incoming calls, press DTMF digits and have the TTS voice speak them back to you.
more samples We have many other samples that you can learn from and combine into any type of telephony program. Click on the link to learn more

The following are a few companies that may help you add SR and TTS to your TeleTools TAPI application. Please note that we have no affiliation with any of these companies and list them here just as a resource for you to help you in your research:

Links to more information

Chant

Maker of Chant Developer Workbench and Chant SpeechKit CDLL, ActiveX, COM, Java, .Net and VCL SAPI components

Out & About Productions

Maker of DTalk, a SAPI speech recognition library written in Delphi as a VCL and  available as an ActiveX

Microsoft Speech 

Microsoft's Speech Technologies WEB page discussion .NET and SAPI

Microsoft SAPI Speech Technology Group

Online documentation for the SAPI spec including ActiveX controls, COM objects, and more can be found HERE

Microsoft 3rd Party SAPI Support

Other sources of products and tools that support MS Speech

AT&T Voice Engine

Don't like a robot sounding voice? Very cool interactive demo letting you hear what somebody who knows how to write a speech engine can do with voice quality.  

Digalo Another speech engine vendor offering an inexpensive speech engine and multiple languages
Elan Hi-Quality speech engines and language support
Lernout & Hauspie (Scansoft) The speech conglomerate, many engines and products

Please contact these companies directly in order to find any more information. 

 

Continue to Page 2