WO2000018100A2 - Interactive voice dialog application platform and methods for using the same - Google Patents
Interactive voice dialog application platform and methods for using the same Download PDFInfo
- Publication number
- WO2000018100A2 WO2000018100A2 PCT/US1999/022145 US9922145W WO0018100A2 WO 2000018100 A2 WO2000018100 A2 WO 2000018100A2 US 9922145 W US9922145 W US 9922145W WO 0018100 A2 WO0018100 A2 WO 0018100A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- list
- voice command
- user
- voice
- responses
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/06—Message adaptation to terminal or network requirements
- H04L51/066—Format adaptation, e.g. format conversion or compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/42204—Arrangements at the exchange for service or number selection by voice
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/53—Centralised arrangements for recording incoming messages, i.e. mailbox systems
- H04M3/5307—Centralised arrangements for recording incoming messages, i.e. mailbox systems for recording messages comprising any combination of audio and non-audio components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/53—Centralised arrangements for recording incoming messages, i.e. mailbox systems
- H04M3/533—Voice mail systems
- H04M3/53333—Message receiving aspects
- H04M3/53341—Message reply
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/224—Monitoring or handling of messages providing notification on incoming messages, e.g. pushed notifications of received messages
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/45—Aspects of automatic or semi-automatic exchanges related to voicemail messaging
- H04M2203/4509—Unified messaging with single point of access to voicemail and other mail or messaging systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/45—Aspects of automatic or semi-automatic exchanges related to voicemail messaging
- H04M2203/4536—Voicemail combined with text-based messaging
Definitions
- the present application includes material which is subject to copyright protection.
- the copyright owner of the material in the present application has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office files or records, but otherwise reserves all copyrights whatsoever.
- the present invention relates generally to a user interface platform which provides interactive voice controlled user access to a telephony or other computer-based application.
- a specific application of the platform provides dial-in telephone access to a user's electronic mail, with advanced operation in response to voice commands .
- Voice mail systems collect and deliver voice telephone messages.
- e-mail systems generally receive text mail messages and deliver them to the intended recipient. In a business context, these systems may be readily accessible from the user's own office via telephone or computer, respectively. Access to incoming messages from other locations, such as while the user is traveling, may be more difficult.
- Remote retrieval of e-mail has been particularly difficult since retrieval typically requires access to a computer connected to the same network as the user's mail server.
- a typical solution to this problem has been to carry a portable computer when leaving the office, finding a telephone jack at a remote location, and dialing into the office network or the Internet to retrieve mail.
- Another solution is to use an e-mail account which may be accessed using a web browser, and to find someone at the remote location who will give the user access to an Internet-connected browser on equipment at that remote location.
- DTMF tones may be useful for simple commands (e.g., play message, delete message, and the like) , they do not readily allow a user to enter more complex commands (e.g., "forward message to John Smith") .
- a traditional voice recognition system in order to interpret such commands, may require recognition of a complete dictionary of spoken words, and then a command intrepretor to interpret the converted text. If a word is misunderstood or not recognized, the complex command may not be executed.
- such systems generally work best with high quality audio signals (e.g., high end sound card, microphone, quiet office environment) rather than a noisy, limited bandwidth (e.g., POTS) signal generated from a noisy environment (e.g., pay telephone) .
- high quality audio signals e.g., high end sound card, microphone, quiet office environment
- a noisy, limited bandwidth e.g., POTS
- interaction with user generated voice commands can be an awkward and difficult system to use.
- a user may not have patience to hear extended lists of menu items in order to respond to each message or decision.
- any pauses or extended delays in processing voice data from a user may cause frustration or mis-communication on the part of the user.
- a voice command entered by a user is not understood or is incorrect, a method of quickly telling the use that a mistake has been made may be required.
- the present invention provides a system and method for allowing a user to hear text-based e-mail messages (as well as sound files such as .WAV files) to be played over an ordinary telephone.
- the invention allows the user to respond to such messages with voice commands to generate reply messages, or reply sound files as well as input other e-mail commands (e.g., forward, delete, save, and the like) .
- the system and method of the present invention provides such voice input by determining in advance, possible voice commands or responses which may be generated in response to a played message or input prompt .
- Voice input signals are then compared to this limited list of possible responses (or "grammar") and the system generates a list of guesses as to which response has been spoken. Confidence levels are assigned to these guesses based upon the relative match between the actual response and the possible expected response. If the confidence level is above a dynamic threshold value, a match between the spoken response and the corresponding possible response is determined.
- the system need not compare spoken responses with an entire English language dictionary of words in order to understand the user.
- the present invention provides for additional responses to be input, including responses which may be unique to each user.
- a user's e-mail address list may be input as text and then converted into corresponding speech patterns or models using a text-co-speech conversion program. These patterns may be co-catenated with speech models for a list of expected command responses, compiled, and loaded into the system.
- the system may understand a voice command of "forward to John Smith” as two sub-commands; the term “forward” which is one of the possible expected responses to a message (along with “delete”, “save”, the like) , and the term “John Smith”, which may be from the user's e-mail address book, which, as described above, may be converted to speech models and co-catenated with the expected responses .
- the system may dynamically learn, based upon frequency of use by a user, which phrases or commands are used more often. Based upon such usage, the mode of operation of the system may be dynamically tuned to minimize extraneous instructions and prompts. Thus, for example, when a user first uses the system, extensive prompts may be provided (e.g., "to save the message, say 'save' or press ' 1') . Once a user has used that command several times, the prompt may be shortened or deleted entirely.
- the system and method of the present invention also provides a technique whereby replies may be generated to e-mail messages and transmitted to the sender.
- a user may select one of a number of stored replies which the user may have previously generated.
- the user may generate a voice reply which may be stored and transmitted as a sound file (e.g., .WAV) file which an e-mail recipient may play over a computer system.
- a sound file e.g., .WAV
- the present invention provides of means of notifying a user that e-mail messages have been received.
- a user may selectively program the system (or the system may be pre- programmed) to notify a user via pager or telephone, that a message or messages have been received. Notification may be made for some or all messages. For example, if a high priority message, or a message from a particular sender is received, the user may be paged or otherwise notified that e-mail is has been received.
- Figure 1 is a block schematic diagram of system architecture in the preferred embodiment.
- Figure 2 is a block diagram providing an overview of a virtual session provided to the user in the preferred embodiment .
- Figure 3 is a block schematic diagram illustrating flow of control from Telephony Voice Server 310 to the applications back- end for a single virtual session.
- Figure 4 is a block schematic diagram illustrating flow of control for a 24 -channel (VS) front-end box from Telephony Voice Server 310 to the applications back-end.
- Figures 5a and 5b are flow charts illustrating flow of control for a Virtual Session voice recognition system.
- Figure 6 is a flow chart for Virtual Session context switching modules according to the preferred embodiment of the present invention.
- Figure 7 is a flow chart illustrating a process for a Virtual Session initialization of a new context.
- Figure 8 is a flow chart for a process of initialization and use of static grammar tokens according to the preferred embodiment .
- Figure 9 is a flow chart for a Virtual Session spontaneous compile and loading of application supplied information into current grammars .
- Figure 10 is a flow chart for a process of adding grammar token definitions to current grammar in a Virtual Session, according to the preferred embodiment of the present invention.
- Figures 11a and lib are flow charts illustrating a phrase comparison process between spoken phrase and current Virtual Session grammar.
- Figures 12a and 12b are flow charts illustrating an overview of a Command Resolver process for script interpretation.
- Figures 13a and 13b are more detailed flow charts illustrating Command Resolver initialization and function.
- Figures 14a, 14b, and 14c are flow charts illustrating a process for Virtual Session DTMF/IVR processing according to the preferred embodiment of the present invention.
- Figure 15 is a state transition diagram for the telephony server process according to the present invention.
- Figure 16 is a flow diagram illustrating the relationship between the processes in the message polling subsystem of the preferred embodiment of the present invention.
- Figure 17 is a flow diagram illustrating the relationship between processes in the message receiving subsystem of the preferred embodiment of the present invention.
- Figure 18 is a flow diagram illustrating the relationship between processes in the message sending subsystem of the preferred embodiment of the present invention.
- Figure 19 is a system diagram illustrating the e-mail delivery systems of the preferred embodiment of the present invention.
- Figure 20 is a flowchart illustrating a process of updating a user profile through a web-based interface, according to a preferred embodiment of the present invention.
- Figure 21 is a dialog topology diagram.
- Figure 1 is a schematic diagram of a computer network architecture useful for providing access to messages by voice control from a remote location.
- the invention is preferably implemented through a computer network 100 including a voice interface server 102, database subsystem 104, file server subsystem 106, polling computer 108, mail sending computer 110, and web server 112.
- Database subsystem 104 incorporates database server 120 and massive database 122, which may be an OracleTM database.
- File server subsystem 106 incorporates file server 124 and message storage system 126 which may be a RAID disk array.
- Voice interface server 102 may be coupled to database subsystem 104 via hub 114, and database subsystem 104 may be coupled to file server subsystem 106, polling computer 108, mail sending and receiving computer 110, and web server 112 through hub 116. Polling computer 108, mail sending and receiving computer 110, and web server 112 may be coupled via hub 118 to Internet 130.
- the remaining computers and servers may also be coupled through hub 118 to Internet 130 to provide externally accessible network connections used for system administration.
- Voice interface computer 102 may be coupled via conventional telephone system interface and switching equipment to the Public Switched Telephone Network (PSTN) 101 or to another telephone network (not shown) .
- PSTN Public Switched Telephone Network
- voice interface computer 102 provides a voice interface to system users through PSTN 101. Through this interface, users may dial in to the system, retrieve messages and take actions based on the messages such as placing telephone calls or replying to the messages.
- Voice interface computer 102 may run Telephony Voice Server 310 software program ( Figure 3) which connects a user to the system via telephone and processes user requests via speech or IVR under script control.
- Voice interface computer 102 also runs Application Proxy 330 and Automation Server 350 which contains the application API.
- Database server 120 runs software programs implementing an e- mail message store and message delivery agent.
- Polling computer 108 performs a POP3 mail polling function.
- Mail sending and receiving computer 110 receives forwarded electronic mail for storage and delivery to users and forwards and sends e-mail in response to user commands .
- Web server 112 implements a personal profiling information system which allows users to create and modify a personal profile for system operation from any Internet-connected computer using an industry-standard web browser or from the telephone using specific voice commands.
- FIG 2 illustrates control flow for a preferred embodiment of Telephony Server (TS) .
- the TS establishes as many Virtual Sessions as there are telephone lines capable of supporting digital speech.
- Each Virtual Session (VS) interacts with a user under control of a script which the Command Resolver is currently running.
- Each Virtual Session has a dedicated voice recognition, speech synthesizer, and applications interface.
- Figure 3 is a block diagram illustrating flow control from Telephony Voice Server 310 to the applications back-end for a single virtual session.
- PSTN 101 may be coupled to Telephony Voice Server 310 which may be running on voice interface server 102 of Figure 1.
- the system implements an application interface which enables Telephony Voice Server 310 to have a network as its point of integration with an application.
- Communications conduits connecting Telephony Voice Server 310 may be Local Named Pipe 320, for example, under TCP/IP.
- the communications interface may be implemented on the same machine as Telephony Voice Server 310 but run as a separate multi-threaded NT Applications proxy 330 (the NT service) .
- Figure 4 is a flow chart illustrating the flow of control for a 24 channel (VS) front -end box from Telephony Voice Server 310 to the applications back-end.
- VS 24 channel
- the Interactive Telephony Dialog Interface presented in the present invention presents a user with a flexible voice dialog system, accessible over the telephone, which allows a user to navigate and retrieve information by voice phrases and voiced connected digits, as well as by DTMF keypad strokes.
- a flexible voice dialog system accessible over the telephone, which allows a user to navigate and retrieve information by voice phrases and voiced connected digits, as well as by DTMF keypad strokes.
- the system may be con Figured to operate with more than one particular application, such as e-mail.
- the interaction between a user and the system may be completely scripted using a script interpreter and an easy-to-use language specified as part of the present invention.
- the system provides a user with a unique person/machine dialog-based interface on half-duplex (one-at-a-time conversation) or full-duplex (the system can be interrupted) telephone connections .
- the rhythm of conversation between the user and system is maintained by a tight coupling between speech elements (Speech Recognizer, Text To Speech, and Digital File Playback and Record) and the Command Resolver which implements an Object Oriented State Machine.
- speech elements Seech Recognizer, Text To Speech, and Digital File Playback and Record
- the Command Resolver which implements an Object Oriented State Machine.
- the service provider can quickly reconFigure the system for new applications or interactions (i.e., greetings, on-line help, application process, and the like) .
- This functionality is implemented in a unique multi-threaded Virtual Session architecture which allows multiple users to simultaneously have independent dialogs with the system.
- the system allows the user to mirror voice commands using keypad strokes under script control.
- the system allows mirror image functionality at the complete discretion of the script writer.
- the system also allows data entry and flow control via DTMF, all internally synchronized by the Virtual Session state machine.
- the system implements a dynamic context-based hierarchy which allows the user to jump around within the tree structure of an application either under voice or DTMF control .
- the result is a smaller active command phrase set which allows greater accuracy in noise and quicker response.
- Tokens may be placed anywhere in a command script and have the properties of script variables as well as dialog enhancements . Tokens may be parsed as regular text string expressions for content, enabling quick phrase-action resolution.
- the system may implement fully integrated User Interface features: a double tone when commands not understood, and a dynamic help facility which is context -dependant and script programmable .
- the system implements an application interface which enables Telephony Voice Server 310 to have a network as its point of integration with the application.
- the network may be local or it may be internet 130.
- the communications conduits connecting Telephony Voice Server 310 may be Named Pipes or Sockets under TCP/IP.
- the communications interface may be implemented on the same machine as Telephony Voice Server 310 but run as a separate multi -threaded NT service.
- the present invention may also include a communications proxy (the NT service) and an applications protocol.
- the system may run on dual Pentium computers under Windows NT4.0 or higher (Multi -tasking OS with threads and events) .
- the system may use TAPI (Microsoft telephony API) as a Telephony interface, SAPI
- the speech engines may comprise the AT&T Watson speech engines for Automatic Speech
- the voice interface computer may incorporate various telephony boards .
- appropriate telephone interface boards include the following: Rhetorex/Octel RDSP 432, Rhetorex/Octel RDSP 24000, Rhetorex/Octel VRS-24, Rhetorex/Octel RTNI-ATI/ASI 24 Trunk, Rhetorex/Octel RTNI-2T1, Natural Microsystems AG-24, Natural Microsystems T, Connect-24, or the like.
- Telephony Voice Server 310 is a multi-threaded application written completely in C++. It may comprise three fundamental parts: The Dialog Thread, Telephony Monitor 240, and the Virtual Session.
- the Dialog Thread is the Primary Thread in which the entire Server initializes itself, once launched.
- the Server may be con Figured to operate in two modes at startup, depending on how the systems administrator wishes the Server to run.
- the default Server runs as an NT service.
- the Server runs on the NT desk top when launched with the
- the Initialization procedure comprises the following functions :
- a) Initialize the system log used to store statistical usage and runtime data.
- b) Determine the number of available telephone lines having the required Media Modes and telephone control sets .
- c) Based on the number of available telephone lines create a Virtual Session Data Storage Class to accommodate thread-safe session data for each Virtual Session, d) Launch a Virtual Session to service each system usable telephone line.
- e) Launch a Telephony Monitor 240 Thread to capture and dispatch Telephone line control messages to the Virtual Session Threads.
- Telephony Monitor 240 catches and dispatches messages associated with telephone control for each individual Virtual Session.
- Telephone messages which the system monitors may comprise TAPI call control messages:
- Telephony Monitor 240 catches call control messages using Event Wait States.
- the Telephone Service Provider/Driver is con Figured to alert the application through NT Kernel Object Events.
- Telephony Monitor 240 is not attached to the Primary thread of the Server, thus freeing it from blocking if the Primary Thread is processing windows messages while communicating with the user (Primary Thread contains the UI to the system administrator) .
- the Service Provider Driver in this case the TAPI Service Provider (TSP)
- TSP TAPI Service Provider
- the system catches them in a Notification Event, decodes the message type as given above then sends the corresponding Virtual Session a windows message.
- Telephony Monitor 240 The type of messages usually processed by Telephony Monitor 240 and dispatched to the appropriate Virtual Session are Connect, Disconnect, and DTMF. Call control handshakes including lineOffering and lineAnswer may be processed in Telephony Monitor 240. Only after a call has been established does Telephony Monitor 240 alert the Virtual Session via a Windows Message to the Virtual Session servicing that particular line.
- More advanced call control functions such as outbound dialing and drop and insert functions, used to conference calls together, may be supported by the Virtual Session in a Telephony Class, Ccall. Therefore, in response to user commands via text-to-speech, the Virtual Session servicing the user may initiate telephony events on the line.
- response of telephony interface by means of handshake messages is processed by Telephony Monitor 240.
- Telephony handshakes in the TAPI model always include lineReply and lineCallState messages which are caught by Telephony Monitor 240.
- the Virtual Session is the top level thread which handles all interactions with the user. There may be as many Virtual Sessions as active telephone lines.
- the Virtual Session may be indexed and identified by a linelD, which may correspond to the Voice Processing device associated with a telephone line.
- a Voice Processing system In order for an application to transfer digital speech to a physical telephony device there must be a Voice Processing system in place which performs the following functions:
- a) Provide halfduplex input/output ports for each telephone line with associated Codec compression modes required by the ASR/TTS/wavefile components. Formats which may be supported include mu-law and 128kbps PCM, 16 bit, littendian digital format. b) Provide a fullduplex input/output port for each telephone line with associated Codec compression as above and echo- cancel . c) Provide an interface to switch voice ports on and off and provides switching capabilities for outbound calling and data stream switching.
- Each Virtual Session has thread safe session data, which may contain: a) Telephone Information; b) Multi -Media device information; c) Virtual Session State Machine flags; d) Virtual Session Data Store; and e) Call Statistics information.
- Each Virtual Session a) Creates an associated ASR engine via an ASREngineObject class; b) Creates an associated TTS engine proxy via the TTSEngineObject class; c) Creates a SubWorker communications thread which processes communications events from a remote TTS daughter process via bi-directional message pipes; d) Creates a RunScript Thread which processes NT events executing the CmdResolver to correlate speech-to-text user command phrases with associated actions embodied in dialog scripts the session is currently running; e) Creates an associated hidden window and message pump which provides the Virtual Session with the ability to process windows messages; and f) Sets up a bi-directional message mode pipe which serves as communication channel for the Server to an e-mail (or any other) applications Proxy.
- the voice dialog system is context-based.
- a context is defined as the set of phrases the system is con Figured to currently understand. All contexts available to the system at initialization time are dependent on initialization files.
- the initialization file may contain the following scripts :
- Each script may have a set of associated grammars and IVR maps which may be correlated to Exchanges which the user may have with the system.
- the structure of command grammar, IVR map, and the associated Exchange is the following:
- phrase4 When phrasel, phrase2 , !3, phrase4
- the "when” line denotes the set of command phrases
- the "!3" indicates that key pad 3 is associated with this Exchange
- the Exchange itself is contained between the outer most curly brackets.
- the Exchange correlates the command phrase and IVR map to the actions which the system will take if it decodes one of the phrases or the appropriate DTMF tone.
- the contexts available to the system are stored in the array of context classes :
- the context object above contains all information necessary for the Virtual Session to conduct the scripted exchanges with the user.
- the command resolver uses the context object to correlate the request with the appropriate action.
- Each script context has an associated context object.
- the system has been con Figured so that IVR Mapping, i.e., telephone key pad keys are correlated to Exchanges in a Context in exactly the same way that phrases are correlated.
- IVR Mapping i.e., telephone key pad keys are correlated to Exchanges in a Context in exactly the same way that phrases are correlated.
- the only difference is that the origin of the user request via voice is through the Voice/Dialog system and the IVR is via the DTMF interface, which will be explained in more detail below.
- the source of speech to text is ASR engine 510, a commercially available speech- to-text recognition system.
- ASR engine 510 may be normalized into a standard interface for the system, which may be notification driven.
- the notification system is modeled to be consistent with SAPI, which is the Microsoft speech standard.
- Phrase Finish 512 is a function which is called when ASR engine 510 has a result to test.
- Phrase Start 548 is a function which is called when ASR engine 510 begins to process a digital stream to try to correlate sounds with phrases in its active context.
- Engine Idle 554 is a function which is called when ASR engine 510 has processed all of the digital information in the AudioSource buffers and begins to wait for new information to come in.
- Barge-in 560 is a function which is called when ASR engine 510 encounters a barge- in token in a grammar phrase and has decoded the words to the left of that token.
- Phrase Start 548 sets Phrase Timer 550.
- Phrase Timer 550 marks the current time and notifies the Virtual Session via a Windows message when the time is up. Meanwhile, as illustrated in Figures 5a-5c, when ASR engine 510 reaches a result or when Phrase Timer 550 goes off, Phrase Finish 512 function is called. Phrase Finish 512 kills Phrase Timer 550 in step 514, and stops loading data in step 516, since the present recognition has been made.
- Phrase Timer 550 is programmable via scripts and serves to speed resolution of recognition in noise. The programmable parameter in Phrase Timer 550, $Phrase_Time, is the time to wait before notifying the Virtual Session that Phrase Finish 512 should be called.
- Phrase Finish 512 uses several state flags to resolve its decision tree . They are :
- Phrase Finish 512 a determination is made whether a valid recognition exists in step 518 by checking to see that return structures of ASR engine 510 have a valid phrase (i.e., if a DTMF tone were heard or some non-white noise, ASR engine 510 might attempt a recognition) . Failure would be flagged by not presenting the application with a resulting phrase.
- the system first checks to see whether the noise flag, m_phid, is set in step 532. If it is not set, the system sets it to True in step 534. If it is already set, then in step 536, the system flushes the AudioSource and resets ASR engine 510 environment tracking, then resets the flag to False indicating that the system has attempted to purge the noisy buffers. This noise flag checking step helps prevent noise from corrupting subsequent attempts at valid recognitions.
- Phrase Finish 512 distinguishes DTMF tones from bad phrases or noise in the following way:
- the system alerts the user that a mis-recognition occurred by playing a double tone (plink) in step 542. If the system plinks, the system sets the Virtual Session flag "playingbeep" , a boolean flag, to True in step 542 to prevent collisions between the TTS and plink. Processing in the loop ends at step 546.
- Toggle-On is a method associated with the Engine Class, Sreng and will turn the AudioSource on if there are no state conflicts. Toggle-On and Toggle-Off are discussed in more detail below.
- step 518 If the system has a valid result phrase as determined in step 518, it calls Toggle-Off in step 520 to prevent ASR engine 510 from interrupting the present processing, and then obtains from ASR engine 510 the confidence score for the best phrase in step 522 (ASR engine 510 may have several guesses at the phrase based on its confidence) . If the confidence score is below the confidence threshold, as determined in step 524, processing passes to routine 526.
- Routine 526 is illustrated in more detail in Figure 5b.
- the system first checks to see whether the noise flag, m_phid, is set in step 566. If it is not set, the system sets it to True in step 570. If it is already set, then in step 568, the system flushes the AudioSource and resets ASR engine 510 environment tracking, then resets the noise flag to False in step 572, indicating that the system has attempted to purge the noisy buffers.
- the noise flag m_phid
- the system then checks to see whether the noise was a DTMF tone in step 574 in a similar manner to step 538. If the noise was not DTMF, the system alerts the user that a mis-recognition occurred by playing a double tone (plink) in step 576. If the system plinks, the system sets the Virtual Session flag "playingbeep", a boolean flag, to True in step 576 to prevent collisions between the TTS and plink. Processing in the loop ends at step 580.
- step 574 the system determines that it has paused for DTMF, it assumes the completion of non-terminated DTMF and resets the flag and restarts ASR engine 510 if Toggle-On state flags permit in step 578. Processing in the loop ends at step 580.
- step 524 if the confidence level is greater than the threshold value in step 524, processing passes to the command resolver in step 528 and the loop ends at step 530.
- the present invention also encompasses a system for tracking tracks density of mis-recognitions which, based on a noise density range of 0 to 1, readjusts the settings of ASR engine 510.
- Noise density may be calculated as follows. Methods of the class Ftime count the number of mis-recognitions over total recognition attempts. Mis-recognitions and recognition attempts are counted in:
- Noise-Floor A noise cut made on the input signal.
- the noise cut may be between 0 and -50dbm.
- the adjustment range is between -15dbm and -35dbm.
- the default setting is 75, indicating a larger than 50% use of non VQ models. In noise the setting is changed to 100, indicating no VQ models should be used.
- 1) Calculation of the noise density is done in PhraseFinish after each mis-recognition and in PhraseFinish after valid recognitions above threshold.
- grammar_activated_ If grammar is not activated in the ASR, Toggle-On checks that none of the following states are set :
- Each script designated by a file with a .scp postfix contained in the Session . ini file, defines a different context according to the Context Data Structures implemented in the present invention, as described previously.
- the Virtual Session is designed as a context-based system in order to limit the number of phrases active in the recognizer at any given time, thus enhancing recognition accuracy and speed of the system.
- a virtual session may switch context, be in the scope of a .scp file, in two ways:
- the Virtual Session Upon initialization, the Virtual Session must start in a predetermined context, (i.e., the login context), which may be controlled by the telephony system. Telephony Monitor 240 notifies the Virtual Session via LINECALLSTATE Connected or Disconnected whenever the line serviced by the Virtual Session becomes active (caller calls the system) or becomes inactive (call hangs up) .
- a predetermined context i.e., the login context
- Telephony Monitor 240 notifies the Virtual Session via LINECALLSTATE Connected or Disconnected whenever the line serviced by the Virtual Session becomes active (caller calls the system) or becomes inactive (call hangs up) .
- the Virtual Session executes the GetReadyForNewSession ( ) method to reinitialize the context of the system to the login script whenever Telephony Monitor 240 notifies the Virtual Session that a new call has been connected on its line.
- the user may issue a command in some context which directs the system to go to another context. For example a script might contain the following exchange: when send reply, ! send reply, ! reply to message ⁇
- step 614 Processing then passes to step 618 (through block 616 as illustrated in Figure 6) where the event is caught by the RunScript thread.
- RunScript thread is launched at Virtual Session initialization time and is running in parallel with the Virtual Session as illustrated in Figure 2. If RunScript Proc 618 determines that a "When" Exchange has occurred in step 620, CmdResolver method ExCmd is called in step 624 with the Rec_Phrase . If a "When" Exchange has not occurred, RunScript Proc 618 looks for an other even in step 622. Method ExCmd 624 determines an index of the exchange via the Rec_Phrase as specified above with reference to the Determination cf Exchange Through Recognized Phrase.
- Command Resolver calls the method CmdLoo .
- CmdLoop Command Loop, determines how to execute the command in accordance with the operation of the Command Resolver as described herein. If this is a simple command, (i.e., not a compound nested command) CmdLoop will call the Resolver method HandleAction in step 628 for each Action in the Exchange. All Actions in an Exchange are members of a linked list. If HandleAction 628 determines the command is a Load in step 630, it captures its argument which is the name of the script to be loaded (the new context) . If HandleAction 628 determines that the command is not a Load in step 630, HandleAction 628 looks for an other action in step 632.
- step 634 the script index is stored in Command Resolver
- step 636 "current_context_” and the script name is stored in SD "script_name” in step 636.
- the RunScript thread then sends the Event Hevntscriptmain in step 638, and processing of this stage ends at step 640. From step 638, Event Hevntscriptmain is caught by RunScript Proc in step 710 of Figure 7. The RunScript will then Proc decode Event Hevntscriptmain in step 712 and call the Command Resolver method InitNewContext in step 716. If the Event is not HevntScriptMain from step 712, RunScript Proc will look for another event in step 714.
- InitNewContext then stops ASR engine 510 in step 710.
- InitNewContext calls ASRLoad in step 722 with the new script index, current_context , then loads the current Exchange pointer, Pexch with the address of the Main Exchange for the new context (script) in step 724.
- the pointer to the Main Exchange is found in the Context Class via the member "Pexch commands" as given above in the section relating to Context Data Structures.
- the Main Exchange is explained above in the section relating to the MultiServer Scripting Language.
- the Main Exchange is the default Exchange which is executed whenever a new context is entered.
- InitNewContext then calls CmdLoop in step 726 which processes each of the Actions in the Main Exchange of the new script.
- step 728 Since "flow control" in the script interpreter permits other Actions to occur while the TTS is still speaking, a WaitForTTSStopTalking is issued in step 728 since the system might come out of CmdLoop while the TTS is still talking. WaitForTTSStopTalking step 728 will block until the TTS stops, at this point the Main Exchange will have initialized the new context and the InScript flag is set to False at step 732. In step 732, ASR engine 510 is started and processing of this routine ends at step 734.
- the dialog system uses embedded grammar tokens in command phrases for two reasons : a) As wild cards to append special sub-grammars to script command phrases . For example in the login script connected digits may be used as sub-grammars introduced into the command phrases as tokens, as the exact length of a pin code may not be known before entry (pin numbers may be between 7 and 16 digits) . Thus, the recursive nature of embedded sub-grammars is an efficient way to introduce variable grammars . b) As a way of introducing spontaneous, application related data, into command phrases.
- each client of the system has a personal profile on the web server.
- client specific data such as "Names to forward messages to”, “message replies to send”, “personal Rolodex”, and the like.
- Grammar Tokens are a mechanism to get data from the outside world into the command phrases (our recognizer is constrained by grammars) .
- Figure 8 illustrates the flow of control which occurs when the system initializes Static Tokens. Static Tokens are always initialized in the Main Exchange of a context as illustrated in step 810. Static Tokens correspond to case (a) above. When the system Loads a new context in step 810 the flow of control is as described previously. If the Script Interpreter finds static tokens in the Main Exchange in step 812, token processing proceeds as illustrated in Figure 8.
- HandleAction 814 finds a static token definition, by parsing a regular expression like the following:
- step 810 Every time a new context is initialized in step 810, the system searches, in steps 816, 820, and 822, the list of tokens it has compiled at initialization time from all the available contexts and fulfills the token rule definition given in the Main Exchange in step 824. From this list, it forms a new grammar rule in step 826 fulfilling the token part of the grammar.
- This token rule is loaded into ASR engine 310 via a call to the Command Resolver method "AddToGrammar" in step 826 without releasing the main context grammar (the "main” context grammar is defined as the grammar in which the token anchors are embedded) or any other token rules that the Main Exchange might specify.
- AddToGrammar accomplishes this by posting a windows message in step 828 to the VS.
- "Our_grammar_token” member containing all information both on the token name and its associated grammar definition, stores all active tokens in order to facilitate matching between spoken phrases and context phrases containing tokens. If a static token definition for the token does not exist, processing passes to step 818.
- Dynamic token definitions are updated via "our_grammar_token" in the Command Resolver method HandleAction during command resolution processing, as illustrated in step 930 in Figure 9.
- the Virtual Session For the second type of tokens (b) , dynamic tokens, the Virtual Session
- Grammar Object has two template lists of Grammar Token objects:
- stl vector ⁇ GraramarToken*> grammar_tokens_; stl: : vector ⁇ GrammarToken*> grammar_tokens_free_list_;
- the first list corresponds to the active list of grammar tokens and the second corresponds to the inactive list .
- Already existing grammar token objects are not deleted in order not to fragment memory. Since these tokens are dynamic, (i.e., their definitions change constantly) , creating and deleting dynamic tokens would be a burden on the OS memory manager.
- the Grammar Token Object has the following members:
- the "name” member allows string manipulation on the token name so that in making comparisons to active tokens, the system may determine whether a token is already active with an obsolete definition or should be newly instated (See Figure 10) .
- the "gram” member is a pointer to the ISGRAMCOMMON (this interface is specified in the Microsoft SAPI specification) interface to the ASR engine which allows the grammar rule corresponding to the token to be Activated, Deactivated, or Released.
- the flow of control for spontaneous loading of applications-related grammar rules is illustrated in Figure 9.
- Figure 10 is a flow diagram illustrating the operation of the AddToGrammar and SponLoad methods .
- step 1110 when ASR engine 310 produces a result (a spoken utterance) it notifies the Virtual Session via a call to the PhraseFinish member of the ASR Engine class belonging to that particular VS. If the result is greater than threshold in step (as described previously) , the Rec_Phrse is passed to the method FeedFromASR where the flag "Listening_To_ASR" is set False after deciding to process the result.
- step 1112 "Listening__to_ASR" is used by the CR, in FeedFromASR, to determine whether or not to process the result. If the flag "Listening_to_ASR" is set True, processing ends at step 1114.
- FeedFromASR sends the windows message Hevntscriptwhen which will be caught by the RunScript thread in steps 1118 and 1120, and resolved so as to execute the Resolver method ExCmd in step 1124.
- step 1136 the context is examined to determine whether it includes embedded tokens. If the context does not include embedded tokens, the Rec_Phrse is compared to the Context Phrases in step 1142 to find a match.
- step 1142 processing passes to step 1148.
- step 1148 a determination is made whether this is the last phrase. If not, processing passes to the next phrase in step 1150 and processing returns to step 1142. If it is the last phrase, the index is decremented and processing then ends at step 1154.
- step 1138 a token count and token names in context phrases are retrieved. Processing then passes to Figure lib via block 1140.
- the tokens are expanded in step 1156 according to their definition in the Resolver member our_grammer_token_list_, which is a list of all grammar tokens known in a particular script.
- the expanded token versions of context phrases containing embedded tokens are then compared to Rec_Phrs in step 1158.
- step 1158 If a match is not found in step 1158, a search is made for more context phrases in step 1166. If more phrases are found in step 1168, processing returns to step 1158 above. If no further context phrases are found, the index is decremented and processing ends at step 1162.
- a Command Resolver is associated with each VS, and the operation of this Command Resolver in the preferred embodiment will now be described in more detail.
- the Command Resolver takes a recognized utterance and performs the following actions:
- Virtual Session creation is governed by system startup, as illustrated by step 1310 in Figure 13a.
- the Command Resolver initializes a Virtual Session by reading the ".ini" file, DIF (Dot Ini File), associated with the particular VS.
- DIF's and VS ' s are correlated by a Master Configuration File, MCF, for Telephony Voice Server 310.
- MCF Master Configuration File
- Telephony Voice Server 310 monitoring 24 different telephone lines, may run 24 different DIF's which are correlated to the 24 VS's monitoring those lines via the MCF in step 1312.
- the MCF looks like this:
- the contents of the DIF was described previously in the description of Context Data Structures.
- a Virtual Session initializes it creates a Command Resolver object and a RunScript thread.
- the Command Resolver object is initialized by parsing the script files in the DIF associated with it via the MCF.
- the Command Resolver initialization creates and populates an array of Context objects associated with each script.
- a map of class objects associated with the Command Resolver is illustrated in Figure 3. For example, if there are 10 script files in a DIF, then an array of 10 context objects is created for that VS. They are indexed via the Command Resolver members : int current_context_;
- the "m_Pcmdcontext " member is an array of pointers to context objects corresponding to different scripts.
- NT events are sent by the Virtual Session to RunScript the thread.
- the Command Resolver runs in the context of the RunScript thread, as "flow control" is imposed on the script interpreter. Flow control means that for certain actions the script interpreter may block. These actions fall into two categories :
- Actions which involve the TTS talking or wavefiles playing The system speaking to the user is a sequential operation. At most the system may be speaking and another speaking action may be queued, but they can't occur at the same time. Thus, the script interpreter must block (playing the next speaking action) until the first one has finished.
- Actions which return data required to proceed This typically involves interactions with the application. Since Telephony Voice Server 310 is linked to the application (e-mail for instance) via a network, a finite time is required to receive a response to an application query. During this time the script interpreter must block pending the receipt of the required information.
- BlockTillReady takes an argument which is an enumerated type.
- the argument tells the function what NT event it should receive so as to stop blocking.
- the argument may be any of the following:
- EVT_WAKE_UP General stop blocking event .
- EVT_TTS Sent when the TTS is finished talking.
- EVT_DTMF Sent when DTMF is in. Resets "safe_to_plink" and "waiting_for_DTMF flags.
- EVT_ASR Sent when ASR has finished loading grammar. Reset ASR_is_loading_grammar flag.
- SYSTEM_KILL Sent when system is coming down.
- EVT TTS ABRT Sent to not block on queued system speaking since a queued system speak has been aborted.
- BlockTillReady receives an event, checks to see if the event corresponds to the one it was programmed to expect, and stops blocking if it finds a match. Depending on the event received, BlockTillReady also sets Command Resolver state flags to the appropriate state as illustrated above.
- the basic task of the Command Resolver is to execute the Actions contained in the Exchange correlated to the spoken phrase. As illustrated in Figures 11a and lib, the Resolver first determines the correct Exchange, then it passes control to CmdLoop. The flow of control for CmdLoop is illustrated in Figures 12a and 12b. CmdLoop 1210 gets the first Action in the Exchange from the Pexch pointer in step 1212.
- CmdLoop 1210 checks this Action to determine whether it is simple action or a Loop in step 1214. If the Action is a Loop in step 1216, the Action mnemonic is; Loop (Argument ) .
- the structure of the Loop Action is as follows:
- CmdLoop keeps track of the Loop argument and executes each of the instructions within the scope of the Loop as many times as the argument provides in step 1220.
- CmdLoop determines whether the action is a subaction.
- a subaction is one which is delineated by a pair of curly brackets and whose initial action was either a Loop or a Conditional.
- Conditionals are Actions which execute a set of Actions dependent on the result of some condition being True or False.
- An example of a conditional is as follows:
- CmdLoop 1210 determines the number of times the loop should be executed by loading the variable nloop in step 1216. Otherwise nloop is set to 1 in step 1218 since the current Action should be executed once.
- step 1230 i.e., a set of curly brackets
- CmdLoop proceeds to execute the Loop until nloop has beer- decremented (in step 1242) to zero (determined at step 1228), at which point CmdLoop either executes the next action in the outermost scope or the next action is a Null. If the next action is a Null, as determined in step 1232, CmdLoop terminates in step 1240.
- HandleAction is the Command Resolver method which parses and executes the current Action.
- Applications commands handled by HandleAction are those which do not require interaction with the applications interface. For example, commands such as: Load, WaitForDTMF, Say, and the like do not require interaction with the application. Commands which do, such as: NextMessage, PreviousMessage, and the like are executed by ProcessAppAction .
- ProcessAppAction communicates with the application via full duplex, message mode Named pipes . There is an applications pipe for each VS. ProcessAppAction parses and formats commands for the applications interface, sends the request, and buffers the response so that it becomes available to the Virtual Session via the Session Data buffer Fdbufout .
- the flow of control for the Command Resolver is illustrated in Figure 13b.
- Telephony Voice Server 310 offers 11 programmable keys per context including the * key.
- the # key is reserved for a TTS or digital file play interrupt . Examples of a keypad key being mapped for IVR functions are:
- keypad keys may be programmed as an alternate to voice commands in performing various exchanges.
- keypad 3 is programmed as an alternative to the user saying "next message", in the second example entering a sequence of digits greater than one digit and terminated by the # sign defaults to the second exchange for which there is no voice alternative.
- This instance of IVR mapping is used as a way of entering a string of digits as data, such as a Pin Number.
- the Command Resolver treats voice commands and IVR maps in the same way. It associates exchanges with DTMF keys in the same way that it associates voiced phrases with an exchange index as explained above with reference to Context Data Structures .
- Context of the script For example; email2.scp, forward. scp ... This denotes being in the context of a script proper, the system is in the "listen state" and all commands associated with that script are available.
- ReadDTMF is a multi-character input function which maps a DTMF string to the value of a variable . For example
- WaitForDTMF allows multiple DTMF entry terminated by the # sign.
- the system waits for DTMF entry, a timer in this case defines an error condition, i.e., user has failed to enter DTMF.
- Telephony Monitor 240 thread as a TAPI notification.
- the system uses a TAPI generated event to signal a response.
- the following TAPI messages are decoded:
- the TAPI event interrupts Telephony Monitor 240 and resolves to LineMonitorDigits.
- LineMonitorDigits function records the digits in Session Data (SD) , then according to the particular LinelD associated with the event LineMonitorDigits calls the FeedFromDTMF function associated with the Command Resolver created in the context of the Virtual Session associated with the current LinelD.
- Flow of control for FeedFromDTMF is illustrated in Figure 10a.
- FeedFromDTMF accumulates the digits as they come in, into the SD variable dtmf_.
- FeedFromDTMF checks two state flags :
- FeedFromDTMF is to make a determination based on state flags whether the system is in the context of the script or in the context of an Exchange. The biggest difference in the actions of the system, given these two different states, is that for continuous digit entry in the context of a script a "when" Exchange must be mapped to a "##" as described above.
- the OOE (Out Of Exchange) flag indicates that the current context does support this mapping.
- an Exchange WaitForDTMF allows multiple digit entry with a terminator.
- the function DTMFFinished may stuff a script variable with a DTMF value if ReadyForDTMF is set or set the "Heventscriptdtmf " event allowing the RunScript thread to catch and decode an IVR map. If the WaitingForDTMF flag is detected the system calls DTMFDigitsAreAvailable which prompts BlockTillReady to look for a # sign in the current DTMF string.
- This command encapsulates script actions which must be executed irrespective of the present state of the system.
- the CriticalSection command was designed to ensure that critical code, typically initialization code, is always executed. Parts of Exchanges may be abort via the actions of the user. For instance the user may decide to IVR map into another Exchange before the present Exchange has had a chance to initialize all of its critical variables .
- the CriticalSection command insures that the system is always in a known state.
- the GetSay (ptext ) command is used to retrieve and speak text strings from the application.
- GoTo Command Action Allows the system to GoTo an Exchange corresponding to pre-mapped IVR "When statement"
- GetSay (ptext ) mail .
- NextMessage ( ) $remaining mail .
- Load ( ) Command Action
- the Load command allows the system to load another script and jump from the present context to the context specified by the argument of the Load function.
- the scriptname . scp argument must point to a valid script.
- a valid script is one whose path may be resolved. In the present system all active scripts appear in specific directories given to the system via environment variables. Also, the script must appear in the ".ini" file for the system. This enables the Virtual Session parser to include the script in the system context tree.
- An example of the Load command is :
- Loo ( ) Command Action
- the Loop command executes an Exchange the number of times specified in the argument of the Loop command.
- Loop command executes a sub-Exchange the number of times specified in the argument of the command.
- An example of the Loop command is as follows:
- GetSay (ptext ) mail .
- GetSay (ptext ) mail .
- ReadNewMessage ( $result ) $continue False ⁇ ⁇ ⁇
- the Main Exchange is the default Exchange for a script .
- Each script may have only one Main Exchange.
- a script is not required to have a Main Exchange, however if it has one the Main Exchange is automatically executed as soon as the script is loaded by the system.
- Action nested Exchanges enable the script writer to execute a sub-Exchange within the scope of an Exchange given some appropriate command as an anchor.
- Nested Exchanges may be nested as deeply as the script writer wishes.
- Anchors for Nested Exchanges may be:
- anchors may comprise language structures which allow execution of an Exchange based on the outcome of some test.
- This command is used in order to play a recorded file
- PlayWav command may be used to play recorded files.
- An example of PlayWav is :
- the ReadDTMF command is used to alert the system that logic in the current Exchange requires non-blocking keypad entry. ReadDTMF is equated to a script variable which holds the value of the next keypad entry the user makes .
- ReadDTMF function is used for any Exchange which requires non-blocking DTMF entry.
- the advantage of this function is that it works with a script variable thus all the operations which a script variable allows is available to it. For example:
- variable result is initialized upon keypad entry.
- the script may manipulate the result of keypad entry to :
- the Say command spools text to the TTS, thus enabling the system to speak whatever text is contained in the argument of the Say command.
- the text argument above is a text string which the script writer fills in. It may either be a constant string or it may be a string with imbedded script variables . For example :
- &&& $forcast 'bright and sunny' a) Say the weather will be bright and sunny. b) Say the weather will be $forcast.
- Action Script Variables may be defined by the script writer within the context of a script .
- Script Variables have a content value and a Boolean value .
- the "name" of the script variable may be any number of alphanumeric characters up to 20 characters in length.
- the name of the variable may contain underscores, i.e., $welcome__complete, $old_messages and the like.
- Each script variable has the scope of script, even if a script variable is defined in the context of an Exchange its meaning is valid throughout the script .
- Each script variable has two values : a) Content value. The value of a literal string. b) Boolean value. If the variable has been initialized via its content value, its Boolean value is True, if the content value has yet to be initialized its Boolean value is False.
- Script variables may also be inserted into text strings, i.e., :
- $forecast 'Todays weather will be' $weather_forcast .
- $forcast 'Todays weather will be hot and sunny' .
- Terminate Command Action The Terminate command may be called in a script when the user has indicated they wish to terminate the current session. It reinitializes the Virtual Session to accept the next phone call.
- Terminate command is when the user says "goodbye", i.e., the appropriate Exchange is as follows:
- Token variables are variables which may become grammar rules. Token variables may be manipulated like script variables, i.e. , they may be equated to script variables, they may be inserted into a text string, they have a Boolean value of True or False, depending on their state of initialization.
- Token variables may be associated with script dependent grammars or may represent additions to script grammars whose origin is the application. This is a way of incorporating into the script grammar information accessed through the application. The origin of this information may be databases, the network, Pirns, and the like. Token variables may also appear in Main Exchanges as script specific sub-grammars:
- GetSay (ptext) mail .
- SelectNewMessage ($ ⁇ digits>)
- the Token variable ⁇ digits> is used in the grammar to include the user saying any combination of the constant grammar on the "When line” and an arbitrary string of connected digits specified by ⁇ digits>.
- ⁇ digit>+ means any number of digits specified by the definition of ⁇ digits>.
- the definition of ⁇ digits> is read; one or two or three or four ...or nine.
- the exclamation in " ⁇ !digits> means that in the help system the system should speak the word "digits" in the command "please read message digit” instead of inserting the definition of the Token.
- the system inserts the Token variable $ ⁇ digits> with the string corresponding to numbers which the user has spoken. For example if the Exchange was executed because the user has said “please read message five one” the Token variable inserted into “Getting new message” is "five one", thus the system says to the user "Getting new message five one” .
- the WaitForDTMF command is used when exchange logic requires blocking, terminated DTMF entry. Syntax WaitForDTMF
- WaitForTTS Command Action The WaitForTTS command is used in order to impose flow control on the execution of the script . Since the execution of the script does not necessarily block during the time the TTS is speaking, the script writer may impose this constraint in certain instances.
- the first instance of the When statement above associates phrases, one through three, with the Exchange which follows it .
- the Exchange is defined by the group of actions following the When line and is delineated by the outermost curly brackets enveloping the actions.
- the exclamation marks preceding phrases two and three exclude these phrases from the automated help system. All phrases without preceding exclamation marks are included in an automated help system invoked by the function "SayHelpCommands" (see SayHelpCommands) .
- the sequence "#number" in the first When statement denotes an IVR map to the keypad number "number” . Numbers preceded by the
- # sign flag an IVR mapping between the When statement's Exchange and the keypad number following the # sign.
- the double pound sign in the second When statement, ## denotes an association between keypad entry of a string of numbers and the Exchange associated with the corresponding When statement. For example, the "When ##
- FIG. 15 is a state transition diagram for the Telephony Server system process according to the preferred embodiment of the invention. Each Virtual Session in the system has voice resources, play/record facilities, and a Command Resolver. The functional interrelation between these elements is illustrated in the Virtual Session system flow diagram. Referring to Figure 15, the state changes, as denoted by numbers in the flow diagram, are defined as follows:
- Transition 4 Recognition event check threshold.
- Transition 5 Recognition above threshold go to Resolver.
- Transition 6 Good Exchange index, go to CmdLoop.
- Transition 7 Exchange processed go to Listen state.
- Transition 8 Exchange contains nested exchanges . Process nested exchange .
- Transition 9 Action parsed in exchange. Handle Action. Transition 10 Action parsed in nested exchange. Handle Action. Transition 11 Action queues the system to speak. Transition 12: Stop speaking notification detected by system. Transition 13 : After TTS has completed go to next action in the exchange . Transition 14 : Handle Action has determined that this action requires communication with application. Actions of the form "application. function" require communication with the application. Transition 15 : Action requires addition of new grammar via grammar tokens . Transition 16 : TTS has stop and there are no more actions in exchange . Check system state . Transition 17: System state permits transition to Listen state. Transition 18: Context switching complete. Context initialized go to Listen state. Transition 19: Handle Action has found a Load command within the current exchange. System transitions to new context .
- the electronic mail services provided in the preferred embodiment will now be described with reference to Figure 19.
- the services provided are based on Internet mail standards .
- Simple Mail Transport Protocol (SMTP) is used to exchange messages between Internet mail servers.
- Post Office Protocol 3 (POP3) is utilized by Internet mail clients to retrieve messages.
- the system implements each protocol, allowing it to receive and/or retrieve Internet e-mail messages for users. Users retrieve messages through a telephone interface.
- the e-mail system comprises five primary components.
- Message Polling subsystem 1910 retrieves e- mail messages using POP3.
- Message Receiving subsystem 1912 receives messages from SMTP servers.
- Message Delivery subsystem 1914 processes and stores messages in the Mylnbox system 1916.
- Message Sender subsystem 1918 formats and send (via SMTP) outgoing replies and forwards.
- Web Service 1920 provides user personal profile maintenance and system administrative tools.
- the diagram in Figure 16 illustrates the relationships between the components of the message polling subsystem.
- the Polling Subsystem actively retrieves messages by establishing POP3 connections to the user's electronic mail system. Available message are checked against the list of messages retrieved during previous sessions. Those messages identified as new are copied into the system.
- Polling subsystem 1910 may comprise two components, Account Scheduler 1610 and Message Poller 1612. Generally, processing proceeds as follows: 1. The Poller requests an account from the Account Scheduler
- the Scheduler selects an account from the database and returns it to the Poller.
- the Poller attempts to establish a connection with the user's POP3 server. If successful, the Poller logs in, using credentials provided by the user during sign-up.
- a list of available messages is retrieved and compared with those known to have been downloaded in a previous session. New messages are downloaded and processed by the Message Delivery Agent .
- FIG. 17 illustrates the relationships between the components of message receiving subsystem 1912.
- Message Receiving Subsystem 1919 receives messages sent to user's account via SMTP server 1710. Messages enter the system through a program called Metalnfo Sendmail 1712, an implementation of the industry standard SMTP server. Sendmail in turn invokes the Message Receiver's remaining components, Uagent program 1714 and Message Handler 1716. Generally, processing proceeds as follows:
- An external SMTP server connects to the sendmail server and transmits a message.
- Sendmail invokes Uagent, a specific implementation of a local delivery agent, or LDA.
- the LDA' s responsibility is to deliver messages to a local user and indicate to sendmail whether the operation completed with or without errors .
- Uagent in turn locates an Message Handler instance, reads the message, and hands-off delivers it to the Handler for further processing.
- Message Delivery Agent 1914 process messages, storing summary information and text-to-speech translations in Oracle database 122. Complete message contents are inserted into a file system based message store.
- Message Delivery Agent 1914 is not a free standing program, but an object component used by both inbound message processing subsystems. Its functions include:
- Message Sender 1810 is responsible for the preparation and delivery of user-created reply and forward messages. In rather simple fashion, Message Sender 1810 monitors a queue of outgoing messages. As outgoing messages are discovered, messages are removed from the queue, prepared for delivery by sendmail 1812, and transmitted through SMTP server 1814. Generally, the processing steps are as follows:
- a Sender monitors the outgoing message queue for new forwards and replies
- the message is read, merged with user specific information, and formatted for delivery
- the sendmail server is contacted for actual message delivery.
- the Java Web Server while not directly involved with the receipt, processing or delivery of messages, hosts several critical interfaces. The overwhelming majority of these interfaces are implemented with the Java Servlet API .
- End-user functionality includes registration, P0P3 account configuration, exclude and priority filters, predefined responses, and the personal directory.
- Administrative interfaces include usage reports, corporate account management, server configuration, and service monitoring and control.
- Figure 20 illustrates a process for creating or maintaining a user profile using a web-based interface.
- the user accesses the server using an industry standard web browser from any Internet-connected computer.
- the user identifies his account and enters a passcode to obtain access to his individual profile, as illustrated in Block 2004.
- the user may then, as illustrated in Block 2006, enter personal directory information.
- This information may include at least the first name, last name, and e-mail address of persons to whom e-mail messages may be regularly forwarded. If the name entered in the personal directory is difficult to pronounce, it is useful to spell the name phonetically or to use a nickname instead of a first and last name.
- the personal directory may include other information such as telephone numbers .
- the user may also, as illustrated in Block 2008, create and edit personalized, pre-set standard reply messages. Any number of these messages may be created and they may be updated at will .
- the information entered includes a reply message name by which the reply message will be specified in the voice control mode.
- a personalized message is entered. For example, the reply message name "Thanks” might be associated with the message "Thanks for the e-mail, I heard it while driving home and will get back to you . "
- a message priority list may also be created in the user profile, as illustrated in Block 2010.
- the user may enter any of the following in corresponding data fields: sender name, sender e- mail address, sender domain, subject line text keywords, and message body text keywords. In operation, if any of these fields match the corresponding characteristics of an incoming e-mail, that e-mail will be designated for priority delivery and will be delivered by voice e-mail before those messages not enjoying similar priority.
- an exclude message list may be created and edited using the web browser interface (as illustrated in Block 2012) .
- Messages may be excluded by sender name, sender e-mail address, sender domain, or subject line text.
- account information may be reviewed, modified, and accounts cancelled if desired using the web browser personal profile interface (Block 2014) .
- the present invention may be provided in a number of different embodiments, each of which may include various modifications and additional features.
- One particularly significant feature of the preferred embodiment is web-based user profile entry. This feature permits the user to access his or her profile from any location using Internet 130, thereby customizing the operating of the user's account at will.
- the user profile may include personal address lists, preferences with respect to the order in which e-mail messages are read during access (such as identifying particular senders for priority handling or which should not be read over the telephone—e . g. newsletters), form e-mail replies which are individualized for the particular user, names and keywords which are likely to be spoken by the user during mail retrieval.
- the personal address lists include an entry and a telephone number associated with that entry, a voice dialing feature may be provided in which the voice command "dial ⁇ name>>" causes placement of a telephone call to ⁇ name>> from the personal address list.
- the system may conduct searches in response to a voice command, based on the stored personal profile. For example, a search-for-sender function may be provided. ("read me the messages in my mailbox from Bill Clinton") . As another example, when a list of search keywords has been provided in the user profile, the system will load those keywords as vocabulary where appropriate, so that (for example) the user may request that mail including those keywords be read. For example, if "purchase order" is a keyword defined identified in the personal profile, the user may ask the system to "read me messages with subject: purchase order,” and the system will recognize the words "purchase order" and select those messages including the keywords as specified.
- mail preprocessing is provided.
- the mail preprocessing feature uses a table correlating certain symbol strings with other words. When the mail is processed, predetermined symbols or series of symbols are replaced by predetermined words before the mail is "read” to the user.
- the equivalence table provides full equivalent phrases as replacements for commonly used acronyms, and provides aurally recognizable equivalent words or phrases as replacements for "emoticons.” For example, ";)" may be replaced by "wink.”
- the system applies several unique features to increase processing speed.
- the system loads a predetermined limited vocabulary which is context - appropriate to the function being performed. In this way, the system need only compare the user's spoken command to a limited number of possible vocabulary words or phrases to identify the intended command. Then, the system compares text strings rather than comparing recorded files while processing verbal commands.
- the system uses a prompt when it is ready to receive a voice command.
- this prompt is a "plink" sound. Failure to recognize the user's command as one of the current vocabulary items is indicated by a different prompt, such as a double plink.
- the system is provided with specific methods of translating visual cues into audible cues in cases where an e-mail message includes such cues.
- HTML pages contain a variety of formatting, including positioning, graphical features, and variations in text appearance.
- Bold text, bullets, and other formatting may also be included in any message.
- a standardized library of sounds, tones, words, changes in voice timbre, and other audible indicators are used as the message is read, in place of the formatting, to reflect visual presentation which is important to a full appreciation of the message, yet which would not otherwise be conveyed in a purely audible transmission.
- the system preferably incorporates special methods for relaying a threaded e-mail to a user.
- threaded e-mail refers to a message which is a forward of or reply to one or more messages and incorporates those previous messages in its text. Threaded e-mail may be identified, and individual messages within the e-mail may be parsed, by processing of message headers included in the e-mail text, counting leading > symbols placed before the message by e-mail clients, and other methods which take into account and process the format imposed on threaded messages by various e-mail clients.
- the messages making up the e-mail may then be read or not read, selectively, based on the stored user profile.
- the system When the user provides a spoken command during reading of e- mail, the system selectively responds to the command to either stop reading or continue reading, while implementing the command.
- the stop/no stop operation is determined both by context and by the nature of the command. For example, the system does not stop reading on receipt of a "speak louder” or “speak faster” command, but stops in response to "send reply.”
- the noise floor on the telephone line connecting the user to the system is detected, and the recognition threshold of the voice recognition engine is changed dynamically based on the level of the noise floor. If the noise level is high, a higher level of certainty may be required before recognition of a command occurs. If the noise level is low, it may be possible to recognize a command with a lower level of certainty.
- the voice recognition engine operates based on a hidden Markov model
- the depth of the Markov "tree" may be changed dynamically based on the noise level to achieve changes in the recognition threshold. In particular, the tree depth may be increased in the presence of more noise, and reduced in the presence of less noise.
- Polling of e-mail addresses supplied by users preferably occurs adaptively. That is, users who historically receive a high volume of e-mail, or a high volume during particular periods in the day, will have their mailboxes polled more often (or more often during historical peak times) than users typically receiving a low volume of mail. For example, business users may receive little e- mail during the evening, while home users may receive more of their e-mail at those times. The time zones in which business users conduct most of their business may also impact e-mail delivery patterns. Whatever the pattern of typical e-mail delivery, it is generally desirable to poll mailboxes in proportion to the likelihood that there is actually mail to be retrieved. This feature of the invention makes it possible to efficiently allocate scarce bandwidth and computing resources directed to polling a large number of mailboxes, and contributes to the large scale capacity of the system according to the present invention.
- an experience level indicator is maintained for each voice command, within each user profile.
- the experience level indicator illustrates the user's familiarity with each available voice command.
- the experience level indicator is changed to reflect that expertise. If the user has demonstrated successful use of a feature several times, then going forward, a reduced level of instruction and assistance may be provided in voice dialog scripts during use of the system when that feature is made available or is in use.
- the present invention also provides for the voice commands from a user may be received and acted upon during either silence between output of the text-to-speech engine or during the time that the text-to-speech engine is sending voice to the user.
- voice commands may be received only when the text-to-speech engine (or voice prompts) are not playing.
- a "voice-barge-in" feature may be provided, whereby a user may talk over prompts or the text-to-speech engine with commands.
- a user may say the command "cancel” or "stop” to stop reading of a message, as opposed to a DTMF input.
- an echo cancellation circuit (similar to that used by a speak phone) may be used to prevent voice prompts or e- mail messages from being perceived as voice inputs .
- One method of the present invention is to anticipate reaction to voice dialog and retrieve data in anticipation of such dialog voice.
- a portion of such data may be initially read, such that the text-to-speech engine can read header data while other header data is being received so as to maintain a continuous speech output without interruptions or pauses which would be annoying to a user.
- the present invention is embodied in an apparatus, system, and method employed by the assignee of the present application, CrossMedia.
- the present invention may be demonstrated by calling 877-246-DEMO.
- the XM Resource Manager abstracts the voice user interface application from the core speech technology engine. By leveraging this advanced feature, any application written with CrossMedia' s technologies may instantly take advantage of new innovations in voice technology as soon as they are commercially available.
- the open architecture approach of the XM Resource Manager allows users and applications to be insulated from the complexities of the underlying voice technology, simplifying programming and speeding the adoption of new technology innovations .
- XM Dialog Manager - CrossMedia incorporates both ASR and TTS engines, and manages real time allocation of speech resources including user specific grammar and vocabularies required for the effective development of voice dialog applications.
- XM Scripting Language - CrossMedia provides a simple Applications Programmer Interface (API) enabling new applications to be developed quickly.
- API Application Programmer Interface
- GUI Graphical User Interface
- the processor translates content into a clear format when read by a text -to-speech engine using CrossMedia' s TTS Conditioning.
- the XM TTS Preprocessor does extensible parsing and translation to provide auditory meaning to information intended to be read.
- the personal profiler enables users to set system preferences for use with CrossMedia' s Voice Email and Voice Activated Dialing products. Examples include telephone numbers, email addresses, standard email replies and email filters. This module is written as JAVA servlets with an SQL interface to the system database for storage.
- This software provides a mechanism for getting copies of a user's email. This is accomplished in one of two ways: email polling and forwarding.
- the software is written in JAVA and can be easily ported to various platforms .
- XM Email Message Classifier - CrossMedia has developed a powerful, rules-based, message management system for classifying messages for filtering and routing.
- the filtering function enables a user to hear only those messages deemed to be important, filtering out other messages .
- XM Applications Gateway - CrossMedia will develop a family of Applications Gateways to access various email systems and database information sources.
- the gateways will be developed in JAVA and may be architecturally distributed.
- XM Resource Manager This provides expansion capability to meet the demands of large marketing partners .
- the current system can handle over 50,000 mailboxes and can be expanded by installing additional servers to handle several hundred thousand to over one million mailboxes.
- Context-sensitive Active Grammar This supports the recognition of millions of words and phrases needed for conversational voice accessible applications.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU64997/99A AU6499799A (en) | 1998-09-24 | 1999-09-24 | Interactive voice dialog application platform and methods for using the same |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10193098P | 1998-09-24 | 1998-09-24 | |
US60/101,930 | 1998-09-24 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2000018100A2 true WO2000018100A2 (en) | 2000-03-30 |
WO2000018100A3 WO2000018100A3 (en) | 2000-09-08 |
WO2000018100A9 WO2000018100A9 (en) | 2002-04-11 |
Family
ID=22287229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/022145 WO2000018100A2 (en) | 1998-09-24 | 1999-09-24 | Interactive voice dialog application platform and methods for using the same |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU6499799A (en) |
WO (1) | WO2000018100A2 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2377119A (en) * | 2001-06-27 | 2002-12-31 | 365 Plc | Interactive voice response system |
GB2380379A (en) * | 2001-06-04 | 2003-04-02 | Hewlett Packard Co | Speech system barge in control |
EP1320963A1 (en) * | 2000-09-06 | 2003-06-25 | Xanboo, Inc. | Adaptive method for polling |
EP1705886A1 (en) * | 2005-03-22 | 2006-09-27 | Microsoft Corporation | Selectable state machine user interface system |
DE102006058552B4 (en) * | 2005-12-12 | 2010-09-23 | Honda Motor Co., Ltd. | reception system |
US8074199B2 (en) | 2007-09-24 | 2011-12-06 | Microsoft Corporation | Unified messaging state machine |
US8090083B2 (en) | 2004-10-20 | 2012-01-03 | Microsoft Corporation | Unified messaging architecture |
CN112230878A (en) * | 2013-03-15 | 2021-01-15 | 苹果公司 | Context-sensitive handling of interrupts |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8379830B1 (en) | 2006-05-22 | 2013-02-19 | Convergys Customer Management Delaware Llc | System and method for automated customer service with contingent live interaction |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4837798A (en) * | 1986-06-02 | 1989-06-06 | American Telephone And Telegraph Company | Communication system having unified messaging |
US4932021A (en) * | 1989-04-03 | 1990-06-05 | At&T Bell Laboratories | Path learning feature for an automated telemarketing system |
US5566272A (en) * | 1993-10-27 | 1996-10-15 | Lucent Technologies Inc. | Automatic speech recognition (ASR) processing using confidence measures |
US5594784A (en) * | 1993-04-27 | 1997-01-14 | Southwestern Bell Technology Resources, Inc. | Apparatus and method for transparent telephony utilizing speech-based signaling for initiating and handling calls |
US5608786A (en) * | 1994-12-23 | 1997-03-04 | Alphanet Telecom Inc. | Unified messaging system and method |
US5652789A (en) * | 1994-09-30 | 1997-07-29 | Wildfire Communications, Inc. | Network based knowledgeable assistant |
US5675507A (en) * | 1995-04-28 | 1997-10-07 | Bobo, Ii; Charles R. | Message storage and delivery system |
US5715466A (en) * | 1995-02-14 | 1998-02-03 | Compuserve Incorporated | System for parallel foreign language communication over a computer network |
US5740231A (en) * | 1994-09-16 | 1998-04-14 | Octel Communications Corporation | Network-based multimedia communications and directory system and method of operation |
US5825854A (en) * | 1993-10-12 | 1998-10-20 | Intel Corporation | Telephone access system for accessing a computer through a telephone handset |
-
1999
- 1999-09-24 WO PCT/US1999/022145 patent/WO2000018100A2/en active Application Filing
- 1999-09-24 AU AU64997/99A patent/AU6499799A/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4837798A (en) * | 1986-06-02 | 1989-06-06 | American Telephone And Telegraph Company | Communication system having unified messaging |
US4932021A (en) * | 1989-04-03 | 1990-06-05 | At&T Bell Laboratories | Path learning feature for an automated telemarketing system |
US5594784A (en) * | 1993-04-27 | 1997-01-14 | Southwestern Bell Technology Resources, Inc. | Apparatus and method for transparent telephony utilizing speech-based signaling for initiating and handling calls |
US5825854A (en) * | 1993-10-12 | 1998-10-20 | Intel Corporation | Telephone access system for accessing a computer through a telephone handset |
US5566272A (en) * | 1993-10-27 | 1996-10-15 | Lucent Technologies Inc. | Automatic speech recognition (ASR) processing using confidence measures |
US5740231A (en) * | 1994-09-16 | 1998-04-14 | Octel Communications Corporation | Network-based multimedia communications and directory system and method of operation |
US5652789A (en) * | 1994-09-30 | 1997-07-29 | Wildfire Communications, Inc. | Network based knowledgeable assistant |
US5608786A (en) * | 1994-12-23 | 1997-03-04 | Alphanet Telecom Inc. | Unified messaging system and method |
US5715466A (en) * | 1995-02-14 | 1998-02-03 | Compuserve Incorporated | System for parallel foreign language communication over a computer network |
US5675507A (en) * | 1995-04-28 | 1997-10-07 | Bobo, Ii; Charles R. | Message storage and delivery system |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1320963A1 (en) * | 2000-09-06 | 2003-06-25 | Xanboo, Inc. | Adaptive method for polling |
EP1320963A4 (en) * | 2000-09-06 | 2007-05-16 | Xanboo Inc | Adaptive method for polling |
GB2380379A (en) * | 2001-06-04 | 2003-04-02 | Hewlett Packard Co | Speech system barge in control |
GB2380379B (en) * | 2001-06-04 | 2005-10-12 | Hewlett Packard Co | Speech system barge-in control |
US7062440B2 (en) | 2001-06-04 | 2006-06-13 | Hewlett-Packard Development Company, L.P. | Monitoring text to speech output to effect control of barge-in |
GB2377119A (en) * | 2001-06-27 | 2002-12-31 | 365 Plc | Interactive voice response system |
US7912186B2 (en) | 2004-10-20 | 2011-03-22 | Microsoft Corporation | Selectable state machine user interface system |
US8090083B2 (en) | 2004-10-20 | 2012-01-03 | Microsoft Corporation | Unified messaging architecture |
EP1705886A1 (en) * | 2005-03-22 | 2006-09-27 | Microsoft Corporation | Selectable state machine user interface system |
DE102006058552B4 (en) * | 2005-12-12 | 2010-09-23 | Honda Motor Co., Ltd. | reception system |
US8074199B2 (en) | 2007-09-24 | 2011-12-06 | Microsoft Corporation | Unified messaging state machine |
CN112230878A (en) * | 2013-03-15 | 2021-01-15 | 苹果公司 | Context-sensitive handling of interrupts |
Also Published As
Publication number | Publication date |
---|---|
AU6499799A (en) | 2000-04-10 |
WO2000018100A3 (en) | 2000-09-08 |
WO2000018100A9 (en) | 2002-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6651042B1 (en) | System and method for automatic voice message processing | |
US9088652B2 (en) | System and method for speech-enabled call routing | |
US6327343B1 (en) | System and methods for automatic call and data transfer processing | |
US6366882B1 (en) | Apparatus for converting speech to text | |
US7242752B2 (en) | Behavioral adaptation engine for discerning behavioral characteristics of callers interacting with an VXML-compliant voice application | |
US8654940B2 (en) | Dialect translator for a speech application environment extended for interactive text exchanges | |
US6507643B1 (en) | Speech recognition system and method for converting voice mail messages to electronic mail messages | |
EP1602102B1 (en) | Management of conversations | |
US7609829B2 (en) | Multi-platform capable inference engine and universal grammar language adapter for intelligent voice application execution | |
US5644625A (en) | Automatic routing and rerouting of messages to telephones and fax machines including receipt of intercept voice messages | |
US7260530B2 (en) | Enhanced go-back feature system and method for use in a voice portal | |
US20050234727A1 (en) | Method and apparatus for adapting a voice extensible markup language-enabled voice system for natural speech recognition and system response | |
GB2323694A (en) | Adaptation in speech to text conversion | |
US20120046951A1 (en) | Numeric weighting of error recovery prompts for transfer to a human agent from an automated speech response system | |
JP2003244317A (en) | Voice and circumstance-dependent notification | |
US6813342B1 (en) | Implicit area code determination during voice activated dialing | |
US8085927B2 (en) | Interactive voice response system with prioritized call monitoring | |
WO2000018100A2 (en) | Interactive voice dialog application platform and methods for using the same | |
US20030055649A1 (en) | Methods for accessing information on personal computers using voice through landline or wireless phones | |
US20060077967A1 (en) | Method to manage media resources providing services to be used by an application requesting a particular set of services | |
CN1380782A (en) | Automatic information system | |
RU2763691C1 (en) | System and method for automating the processing of voice calls of customers to the support services of a company | |
US7327832B1 (en) | Adjunct processing of multi-media functions in a messaging system | |
Chou et al. | Natural language call steering for service applications. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref country code: AU Ref document number: 1999 64997 Kind code of ref document: A Format of ref document f/p: F |
|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
AK | Designated states |
Kind code of ref document: C2 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 1-105, DESCRIPTION, REPLACED BY NEW PAGES 1-100; PAGES 106-125, CLAIMS, REPLACED BY NEW PAGES101-119; PAGES 1/25-25/25, DRAWINGS, REPLACED BY NEW PAGES 1/25-25/25; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
122 | Ep: pct application non-entry in european phase |