TTS is a process to convert text to the corresponding wave file. In order to simplify the process of call automation, Xtend IVR introduces a sample script for text to speech recognition system. The automated IVR can play the text file using the TTS engine. Various SAPI XML tags are used in the SPEAK command to implement tone variations and to handle several number-to-speech conversions.
Download the evaluation version of Xtend IVR and install the telephony application in your system. Run the sample script from the Script Editor. Click here to refer the code.
The following XML tags are used in this script.
<SPELL> | Spells out the text |
<SILENCE> | Introduces an interval of silence |
<PARTOFSP> | Enables to pronounce a word with multiple pronunciations correctly depending on its part of speech |
<VOLUME> | Adjusts the output volume level |
<VOICE> | Selects a voice based on its attributes: Age, Gender, Language, Name, Vendor and VendorPreferred |
<LANG> | Selects a voice based solely on its language attribute |
<EMPH> | Emphasizes a section of text |
<CONTEXT> | Enables the voice to distinguish and normalize special formats like dates, numbers and currency |
<PITCH> | Controls the pitch of a voice |
<RATE> | Controls the rate/speed of the voice |
MAIN: answer 1 speak "Welcome. A variety of speak commands are given below." speak 'The following words are spelled out. <spell>These words should be spelled out</spell>' speak 'One Thousand milliseconds of silence <silence msec="1000"/> just occurred.' speak 'The following text differentiates the word "record" depending on its parts of speech. Did you <partofsp part="verb"> record </partofsp> that <partofsp part="noun"> record </partofsp>?' speak '<volume level="50">This text should be spoken at volume level fifty. <volume level="80">This text should be spoken at volume level eighty. </volume></volume><volume level="100"/>All text which follows should be spoken at volume level one hundered.' speak '<voice required="Language=409;gender=female">A U.S. English female voice should speak this.</voice><lang langid="413">A British English voice should speak this.</lang>' speak '<SAPI>This text is spoken without emphasis. This text is spoken <EMPH> with emphasis.</EMPH></SAPI>' speak 'Date is spoken now as month, day, year. <context id="date_mdy"> 03/04/2001 </context>' speak 'Date is spoken now as day, month, year. <context id="date_dmy"> 03/04/2001 </context>' speak 'A Cardinal number is spoken next. <context ID = "number_cardinal">3432 </context>' speak 'The following Number is spoken as digits. <context ID = "number_digit"> 3432</context> ' speak 'A Fractional number is spoken now. <context ID = "number_fraction">3/15 </context> ' speak 'Following is a Decimal Number. <context ID = "number_decimal">423.1243 </context> ' speak 'A pronunciation for Currency follows. <context ID = "currency">$34.90 </context> ' speak '<pitch absmiddle="5">This text should be spoken at pitch five.<pitch absmiddle="-5">This text should be spoken at pitch negative five.</pitch> </pitch><pitch absmiddle="10"/>All following text are spoken at pitch 10' speak '<rate absspeed="5">This text should be spoken at rate five.<rate absspeed="-5">This text should be spoken at rate negative five.</rate> </rate><rate absspeed="2"/>All following text are spoken at rate two.' speak "Good bye." hangup goto MAIN ONHANGUP: hangup goto MAIN ONSYSTEMERROR: log $error display $error hangup goto MAIN