USH2187H1 - System and method for gender identification in a speech application environment - Google Patents
System and method for gender identification in a speech application environment Download PDFInfo
- Publication number
- USH2187H1 USH2187H1 US10/186,049 US18604902A USH2187H US H2187 H1 USH2187 H1 US H2187H1 US 18604902 A US18604902 A US 18604902A US H2187 H USH2187 H US H2187H
- Authority
- US
- United States
- Prior art keywords
- gender
- user
- words
- grammar
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/263—Language identification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
Definitions
- the present invention relates to the field of speech applications involving a dialogue between a human and a computer. More specifically, the present invention relates to a system and method for improving the performance of a speech application deployed in an environment in which the written language (which may be any language set down in writing in any of various ways) of a user exhibits gender specific characteristics.
- a conversation (dialogue) between two entities is a series of exchanges where each participant listens to at least some part of what the other participant is speaking, and the other participant reacts by speaking or performing some action.
- Creating speech applications which are computer applications that engage in such dialogues with people, is a complex task.
- a speech application typically proceeds in accordance with a call flow that defines the dialogue between a user and the computer on which the speech application executes.
- the call flow of a speech application is typically comprised of a series of “states,” which correspond to different stages in the dialogue (e.g. initial state, get-identity-of-speaker state, take-first-item-in-order state, etc.).
- Each of these states is typically associated with a “prompt,” which the speech application may use to prompt the user, a set of expected “responses,” which the speech application can expect from the user, and a way to process the prompt given, response received, and any other external data to perform an action or to move to another state.
- a speech application must be able to detect an utterance (e.g., “response”) spoken by a user and convert it into some non-audio representation, for example, into text.
- a speech application typically relies on an automatic speech recognizer (ASR) to perform this task. Once an ASR determines what a speaker has said, the ASR itself, or in some cases another component such as a natural language interpreter, may receive the non-audio representation of the utterance and based on that utterance, the state of the conversation to that point, and any external factors that need to be considered, determine the meaning of the utterance.
- ASR automatic speech recognizer
- ASRs are available commercially from a variety of different vendors. Examples of commercially available ASRs include the Nuance product commercially available from Nuance Communications, Inc., the SpeechPearl product commercially available from Philips Electronics N.V., and the OpenSpeech Recognizer commercially available from SpeechWorks International, Inc.
- An ASR is “speaker independent” if it does not need to have heard the speaker's voice before in order to recognize the speaker's utterance.
- An ASR is “continuous” if it does not require the speaker to pause between words.
- a speech application cannot know for certain what the user will say or how the user will say it.
- a useful speech application should be constructed to be ready for all reasonable contingencies.
- the speech application “listens” via an ASR for one response from a set of responses, it does so using a “grammar” for those responses. That is, the speech application “loads” a particular grammar into the ASR for a given set of expected responses This grammar specifies everything the ASR will listen for when it is listening for a given response.
- a grammar for the expected reply to the prompt “What method of shipping would you like to use?” might be represented as ((“I want to use”
- the expected replies would be “I want to use regular mail”; “I'd like to use regular mail”; “Please use regular mail”; “Regular mail”; “I want to use express shipping”; “I'd like to use express shipping”; “Please use express shipping”; “Express shipping”; “I want to use next day mail”; “I'd like to use next day mail”; “Please use next day mail”; or “Next day mail”.
- BNF Backus-Naur Form
- Other formats can be used.
- One such format is the XML format promulgated by the W3C organization.
- ASRs typically have their own grammar file format specified by the vendor.
- a speech application developer is required to adhere to the grammar format of the specific ASR being used.
- Development tools are available to aid a developer in generating the necessary grammars for a given speech application.
- One such tool is the Natural Language Speech Assistant (NLSA) developed by Unisys Corporation, assignee of the present invention. Further information concerning this tool is provided in U.S. Pat. No. 5,995,918, issued Nov. 30, 1999, entitled “System and Method for Creating a Language Grammar Using a Spreadsheet or Table Interface.”
- a gender-neutral grammar can become quite large as it has to be capable of handling both the male and female versions of various phrases.
- the larger a grammar becomes the less accurate an ASR will perform, as there are more opportunities for mistakes and misrecognitions.
- the speed of recognition is also affected when grammars become large. Consequently, there is a need for systems and methods for improving the speech recognition accuracy in, and overall dialogue design of, speech applications intended to be used with speakers whose written languages exhibit these kinds of gender specific characteristics. The present invention addresses this need.
- the present invention is primarily directed to a system and method for improving the performance of a speech application deployed in an environment in which the written language of a user exhibits gender specific characteristics, such as the Russian, Ukrainian, and Polish languages mentioned above.
- an automatic speech recognizer is used in conjunction with a gender-neutral grammar to recognize words uttered by a user of the speech application at a given state of the dialogue implemented by the speech application.
- An identification of the gender (ie., male or female) of the user is then made from one or more of the recognized words based on gender-specific characteristics of the written language of the user.
- the gender identification may then be used at a subsequent state of the dialogue to select a grammar specific to the identified gender of the user.
- Use of a gender specific grammar may increase the accuracy of subsequent recognition attempts.
- the speech application may compare a gender identification made at a prior state of the dialogue with a gender identification made at a subsequent state and may adjust a confidence level associated with a recognition of words by the ASR at that subsequent state based on a comparison of those gender identifications. For example, if the gender of a user is identified from the recognized words uttered by the user at a first state with a relatively high confidence level, and then the words recognized at a subsequent state indicate a different gender, then the confidence level associated with the recognition of words at that subsequent state may be lowered since the gender identification does not match that of the previous state, i.e., the mismatch between the gender identifications suggests a possible misrecognition.
- the identification of the gender of a user may be used by the speech application to provide gender-specific prompts to the user at various states of the dialogue.
- the invention may also be applied to written communications, and prompts and grammars used to interpret written communications can also be modified in the same way as described here for oral communications with a user.
- a spy program in a chat room or a computer-based psychotherapy program, or even a banking interface program, will have much greater success in convincing a user that he or she is corresponding with another human, or at least raising the comfort level of the user, if the gender-specific parts of speech are appropriately used in the written conversation.
- the invention's primary present purpose is for oral communications using speech recognizer applications.
- FIG. I illustrates the basic processing flow of an exemplary speech application
- FIG. 2 is a flow diagram illustrating one embodiment of a method of the present invention
- FIG. 3 is a diagram providing further details regarding an aspect of the present invention.
- FIG. 4 is a block diagram illustrating one embodiment of a system in which the present invention may be implemented.
- the present invention is directed to a system and method for improving the performance of a speech application deployed in an environment in which the written language of a user exhibits gender specific characteristics, such as the Russian, Ukrainian, and Polish languages mentioned above.
- an automatic speech. recognizer is used in conjunction with a gender-neutral grammar to recognize words uttered by a user of the speech application at a given state of the dialogue implemented by the speech application.
- An identification of the gender (i.e., male or female) of the user is then made from one or more of the recognized words based on gender-specific characteristics of the written language of the user.
- a speech application that is part of an interactive voice response (IVR) system that provides customer service information to cell phone users may prompt a user with the question “What happened to your cell phone?”
- One response to such a question may be “I lost it.”
- languages such as Russian, Ukrainian, and Polish
- that response may be verbalized differently (as well as written differently) depending upon the gender of the speaker.
- past tense verbs are written and spoken differently depending on the gender of the writer/speaker.
- a gender-neutral grammar for an automatic speech recognizer (ASR) designed to enable the ASR to listen for either of the gender-based forms of the Russian expression of the phrase “I lost it” would include one grammar rule for the expression of the phrase by a male and another grammar rule for the expression of the phrase by a female. An identification of the gender of the speaker can thus be made from the words recognized by the ASR based on the gender-specific characteristics of the written language of the user.
- the gender of the speaker would be identified as “male,” whereas if the ASR recognizes the speaker to have said “ ,” then the gender of the speaker would be identified as “female.”
- An automatic speech recognizer typically returns an ASCII representation of what it recognizes a speaker to have said, along with a value indicative of a confidence level associated with that recognition, ie., a value that expresses how confident the ASR is that the recognized words are indeed what the speaker said.
- a value indicative of a confidence level associated with that recognition ie., a value that expresses how confident the ASR is that the recognized words are indeed what the speaker said.
- an identification of the gender of the speaker based on the recognized words uttered by a user as described above, would only be made if the confidence level associated with the recognition of those words is above a certain threshold level. For example, assume that the confidence level associated with a given recognition attempt is expressed as a percentage, 0% being the lowest and 100% being the highest.
- One embodiment of the present invention may set a confidence level of 80% as the threshold with respect to which a gender identification will be made.
- the ASR recognizes the speaker to have said “ ” (male) but the confidence level associated with that recognition is only 60%, then a gender identification will not be made based on that utterance. On the contrary, if the confidence level associated with that recognition attempt was 95%, then the speaker's gender would be identified as “male.” In other embodiments, however, the confidence level may play no role in the gender identification.
- a gender identification made in the manner described above can be used in several ways to improve the overall performance of the speech application.
- the gender identification may be used at a subsequent state of the speech application dialogue to select a grammar specific to the identified gender of the user.
- the speech application may compare a gender identification made at one state of a speech application dialogue with a gender identification made at a subsequent state and may then adjust the confidence level associated with a recognition of words by the ASR at that subsequent state based on a comparison of those gender identifications.
- the identification of the gender of a user may be used by the speech application to provide gender-specific prompts to the user at various states of the dialogue.
- FIG. 1 is a flow diagram illustrating the general processing flow of an exemplary speech application.
- a speech application typically implements a dialogue between a user and a computer in order to provide some service to the user, such as Voice Mail, Bank By Phone, Emergency Number Facilities, Directory Assistance, Operator Assistance, Call Screening, Automatic Wake-up Services, and the like. Speech applications are an integral part of many interactive voice response (IVR) systems in use today.
- IVR interactive voice response
- the dialogue that a speech application carries out is often expressed as a series of interconnected states, e.g., BEGIN DIALOGUE, STATE 1, STATE 2, STATE 3, TERMNATE DIALOGUE, etc., that define the flow of the dialogue.
- the dialogue may transition from STATE I to either STATE 2 or STATE 3, and then end at the TERMINATE DIALOGUE state.
- each state of a dialogue represents one conversational interchange between the application and a user. Components of a state are defined in the following table:
- Prompt Defines what the speech would you like to place application says to the an order?
- end user Response Defines every possible user YES (yes, yes please, response to the prompt, certainly . . .) including its implications NO (No, not right now, to the application (i.e. no thanks . . .) meaning, content)
- HELP Help, How do I do that . . .
- OPERATOR I need to talk to a person
- Action Defines the action to be YES/System Available - performed for each response go to based on current conditions PLACEORDER state YES/System Unavailable - go to CALLBACKLATER state . . .
- FIG. 2 illustrates the processing performed by a speech application at any given state of the dialogue.
- the speech application plays a prompt to the user (e.g., “Would you like to place an order?”, “What happened to your cell phone?”, etc.).
- the prompt is typically “played” to a user by outputting an audio signal over whatever interface through which the user is interacting with the speech application.
- the user may be interacting with the speech application over a telephone handset, or the user may be using a microphone and speakers attached directly to the computer that hosts the application.
- the prompt may be played over a Voice-over-IP (VOIP) connection.
- the prompt may initially be in an ASCII format and then converted to an audio signal via a text-to-speech (TTS) converter.
- TTS text-to-speech
- the prompt may have been prerecorded and stored on the computer that hosts the speech application such that it can be retrieved from storage and played to the user over the particular audio interface.
- the speech application prepares an automatic speech recognizer (ASR) for the response phase of the given state by, among other things, loading the ASR with the appropriate grammar containing the rules for recognizing the set of expected responses that a user may utter at that state.
- ASR automatic speech recognizer
- the speech application is deployed in an environment in which the written language of the speaker exhibits gender specific characteristics (such as Russian, Ukrainian, and Polish)
- the ASR may be loaded with a gender- neutral grammar containing grammar rules for expected responses that may be uttered by both male and female users.
- ASRs are available commercially from a variety of different vendors.
- ASRs examples include the Nuance product commercially available from Nuance Communications, Inc., the SpeechPearl product commercially available from Philips Electronics N.V., and the OpenSpeech Recognizer commercially available from SpeechWorks International, Inc.
- Commercially available ASRs typically have their own grammar file format specified by the vendor. A speech application developer is required to adhere to the grammar format of the specific ASR being used when developing grammars for the ASR.
- the ASR provides the results of its attempt to recognize an utterance by the user.
- an ASR typically returns an ASCII representation of what it recognizes a speaker to have said, along with a value indicative of a confidence level associated with that recognition, i.e., a value that expresses how confident the ASR is that the recognized words are indeed what the speaker said.
- the ASR may also perform a natural language understanding function to return an indication of the meaning of the recognized words. In other embodiments, this natural language understanding function may be performed by a separate component, sometimes referred to as a natural language interpreter (NLI).
- NLI natural language interpreter
- An example of an NLI used to determine the meaning of an utterance as recognized by an ASR is found in U.S. Pat. No. 6,094,635 (in which it is referred to as a “runtime interpreter”) and in U.S. Pat. No. 6,321,198 (in which it is referred to as the “Runtime NLI).
- the results of the speech recognition attempt by the ASR are analyzed in step 23 in an attempt to identify from those results the gender of the user.
- the gender of the speaker is identified from the recognized words uttered by the user based on gender-specific characteristics of the written language of the user.
- the grammar rule for a male expression of a given phrase may be associated with a value, e.g., “M”, and the grammar rule for a female expression of a given phrase may be associated with a value, e.g., “F.”
- the ASR may provide such a value to the speech application as part of the results of the recognition attempt. Again, as discussed above, whether the ASR provides such an indication may depend upon a confidence level associated with the particular recognition attempt.
- the speech application itself could make a gender identification from the ASCII representation output from the ASR. In either case, the speech application may define a global variable, such as “GENDER,” that holds a value such as “M,” “F,” or “N” to indicate that the gender of the user is male, female, or undetermined (i.e., neutral).
- the speech application dialogue may be designed to play, at a relatively early state in the dialogue, a prompt to the user that is likely to elicit a gender-specific response based on the gender-specific characteristics of the written language of the user.
- a gender identification can be made early in the dialogue to enable the remainder of the dialogue to take advantage of that identification in the manners described below.
- the gender identification obtained in step 230 may be used by the system at a subsequent state to load a gender-specific grammar (i.e., male or female) for that state at step 210, instead of a gender-neutral grammar, as discussed further below.
- the gender identification may also be used in step 200 at a subsequent state to alter the prompts offered for further communication based on knowing the gender of the user where such gender-specific prompts may be appropriate.
- a gender identification made at a given state of a speech application dialogue may be us at a subsequent state to select a grammar for use at that state that is specific to the identified ender of the user.
- the speech applicatio developer will create, for each of at least some of the states of the speech application dialogue, both gender-neutral and gender-specific grammars for that state.
- the developer will do the same for other states at which the expected tterance from a user may reflect similar gender-specific characteristics of the written language of the user. (Concomitantly with the use of grammars (2) and (3), appropriate gender-specific prompts may be used as well, where appropriate).
- use of a gender-specific grammar may enhance the accuracy of the ASR at such subsequent states, because a gender-specific grammar has less grammar rules than a gender-neutral grammar (since it only has to have rules to recognize utterances by one gender).
- FIG. 3 is a diagram illustrating further details of how a gender identification can be used to select gender-neutral and gender-specific grammars at various states of a dialogue in accordance with one embodiment of the invention.
- a speech application will initially utilize the gender-neutral grammars at various states of a dialogue. However, if a gender identification is made at a particular state in the manner described above based on the results obtained with a gender-neutral grammar, and that recognition has a high confidence level, then as shown at 340, the speech application may transition to the use of gender-specific grammars at subsequent states of the dialogue, as shown at 310.
- the speech application will continue to employ the appropriate gender-specific grammar based on the previous gender identification (male/female) as long as the confidence associated with the recognition results obtained using those gender-specific grammars remains high, as illustrated at 330. If, however, at a given state of the dialogue, the use of a gender-specific grammar results in one or more recognitions with low confidence levels, the speech application may transition back to the use of the gender-neutral grammars, as illustrated at 350. The speech application will continue to use the gender-neutral grammars if recognition attempts continued to produce low confidence results or in the event of a misrecognition or silence by the user. Again, however, if at any subsequent state a gender identification is again made with high confidence, then the speech application will once again transition to the use of the appropriate gender-specific grammar.
- the speech application may compare a gender identification made at a prior state of the dialogue from a gender-neutral grammar with a gender identification made at a subsequent state with another gender-neutral grammar and may then adjust a confidence level associated with a recognition of words by the ASR at that subsequent state based on a comparison of those gender identifications.
- an ASR may be loaded with a gender-neutral grammar and a gender identification may be made in the manner described above from the recognized words uttered by a user based on gender-specific characteristics of the written language of the user.
- the ASR may again be loaded with a gender-neutral grammar for that state and a recognition attempt made. That recognition attempt may also produce a gender identification made in the manner described above.
- the gender identification made at this subsequent state may be compared to the gender identification made at the prior state, and an adjustment may be made to the confidence level associated with the results of the recognition at the subsequent state based on the comparison.
- the gender identification can be used as a further measure of the confidence associated with a particular recognition attempt.
- a gender identification made at a given state of a speech application dialogue in the manner described above may be used by the speech application at subsequent states to select prompts to be played to the user that are more appropriate for the identified gender. For example, after identifying the gender of a user as “male,” subsequent prompts may address the user with the salutation “Mr.,” whereas if the gender of the user is identified as “female,” subsequent prompts may address the user with the salutation “Miss” or “Mrs.”
- FIG. 4 is a block diagram illustrating an exemplary system in which the present invention may be embodied.
- the system comprises a speech application 400 that carries out a dialogue with a user, as described above, wherein the dialogue comprises a plurality of states.
- the speech application may be implemented in a high level programming language, such as, for example, C, C++, or Java.
- the program code may be implemented in assembly or machine language.
- the language may be a compiled or an interpreted language.
- the speech application may also be developed using any of a variety of commercially available speech application development tools, including, for example, the Natural Language Speech Assistant (NLSA) available from Unisys Corporation.
- NLSA Natural Language Speech Assistant
- the speech application 400 interfaces with an automatic speech recognizer (ASR) 410 that, at the direction of the speech application, recognizes words uttered by a user in response to a prompt at a given state of the dialogue based on a grammar specified by the speech application for that state.
- ASR automatic speech recognizer
- the ASR may comprise any commercially available or proprietary speech recognizer.
- the ASR comprises a speaker independent, continuous speech recognizer.
- a user interfaces with the speech application 400 using a telephone 450 connected to the public switched telephone network (PSTN) 440.
- PSTN public switched telephone network
- a telephony interface 430 provides an interface between the PSTN 440 and the speech application 400 and ASR 410 in a conventional manner.
- the user may interface with the speech application in other ways, such as via a microphone and speakers attached to the computer on which the speech application executes, via a voice-over-IP (VOIP) connection, or via a voiceXML browser.
- VOIP voice-over-IP
- the system may further comprise a natural language interpreter (NLI) 420, in the event that its functionality is not provided as part of the ASR 410.
- the NLI accesses a given grammar, which expresses valid utterances, and associates them with tokens and provides other information relevant to the application.
- the NLI extracts and processes a user utterance based on the grammar to provide information useful to the application, such as a token representing the meaning of the utterance. This token may then, for example, be used to determine what action the speech application will take in response.
- a token representing the meaning of the utterance. This token may then, for example, be used to determine what action the speech application will take in response.
- the operation of an exemplary NLI is described in U.S. Pat. No. 6,094,635 (in which it is referred to as the “runtime interpreter”) and in U.S. Pat. No. 6,321,198 (in which it is referred to as the “Runtime NLI”).
- the system further comprises a memory device 460 that stores, for each of at least some of the states of the speech application dialogue, a gender-neutral grammar for that state.
- the memory device 460 stores gender-neutral grammars 470, 480, 490 (designated G1N, G2N . . . GxN, respectively) that are each associated with a given state of the dialogue (e.g State 1, State 2 . . . State x, etc.).
- the memory device 460 may also store, for at least some of the states of the dialogue, gender-specific grammars (472, 474, 482, 484, 492, 494) for those states (e.g., grammars GIM and GIF for State 1, grammars G2M and G2F for State 2, grammars GxM and GxF for State x, and so on).
- the memory device 460 may comprise any computer-readable storage medium, such as a floppy diskette, CD-ROM, CD-RW, CD-R, DVD-ROM, DVD-RAM, hard disk drive, magnetic tape or any other magnetic, optical, or otherwise machine-readable storage medium.
- the system illustrated in FIG. 4 may be used to carry out any of the aspects of the present invention described above.
- the ASR 410 may use one of the gender-neutral grammars 470, 480, 490 at a particular state to recognize words uttered by a user of the speech application 400, and the system may then identify a gender (i.e., male or female) of the user from one or more of the recognized words based on gender-specific characteristics of the written language of the user.
- the gender identification may be used at a subsequent state of the speech application dialogue to select a grammar specific to the identified gender of the user (e.g., one of grammars 472, 474, 482, 484, 492, 494, etc.).
- the speech application 400 may compare a gender identification made at one state of the dialogue with a gender identification made at a subsequent state and may then adjust the confidence level associated with a recognition of words by the ASR 410 at that subsequent state based on a comparison of those gender identifications.
- the identification of the gender of a user may also be used by the speech application 400 to provide gender-specific prompts to the user at various states of the dialogue.
- the system of the present invention can be implemented on any of a variety of computing platforms and is in no way limited to any one computing platform or speech application development environment.
- the methods and system described above may be embodied in the form of program code (i.e., instructions) stored on a computer-readable medium, such as a floppy diskette, CD-ROM, DVD-ROM, DVD-RAM, hard disk drive, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
- program code i.e., instructions
- the present invention may also be embodied in the form of program code that is transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, over a network, including the Internet or an intranet, or via any other form of transmission, wherein, when the program code is received and loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
- a machine such as a computer
- the program code When implemented on a general-purpose processor, the program code combines with the processor to provide a unique apparatus that operates analogously to specific logic circuits.
- the program code may be implemented in a high level programming language, such as, for example, C, C++, or Java. Alternatively, the program code may be implemented in assembly or machine language. In any case, the language may be a compiled or an interpreted language.
- the present invention is directed to systems and methods for improving the performance of a speech application deployed in an environment in which the written language of a user exhibits gender specific characteristics. It should be appreciated that changes could be made to the embodiments described above without departing from the inventive concepts thereof. It should be understood, therefore, that this invention is not limited to the particular embodiments disclosed, but it is intended to cover all modifications within the spirit and scope of the present invention as defined by the appended claims.
Abstract
Description
English Phrase | male would say | female would say |
“I opened the window” | ||
“I completed the exam” | ||
“I was afraid” | ||
“I came” | ||
“I lost it” | ||
Consequently, the designer of an ASR grammar to be used to recognize the speech of a Russian speaker may have to include representations of both the female and male versions of a given spoken phrase in order for the grammar to remain speaker independent, i.e., gender neutral.
Component | Function | Examples |
Prompt | Defines what the speech | Would you like to place |
application says to the | an order? | |
end user | ||
Response | Defines every possible user | YES (yes, yes please, |
response to the prompt, | certainly . . .) | |
including its implications | NO (No, not right now, | |
to the application (i.e. | no thanks . . .) | |
meaning, content) | HELP (Help, How do I | |
do that . . .) | ||
OPERATOR (I need to talk | ||
to a person) . . . | ||
Action | Defines the action to be | YES/System Available - |
performed for each response | go to | |
based on current conditions | PLACEORDER state | |
YES/System Unavailable - | ||
go to | ||
CALLBACKLATER | ||
state . . . | ||
Claims (28)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/186,049 USH2187H1 (en) | 2002-06-28 | 2002-06-28 | System and method for gender identification in a speech application environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/186,049 USH2187H1 (en) | 2002-06-28 | 2002-06-28 | System and method for gender identification in a speech application environment |
Publications (1)
Publication Number | Publication Date |
---|---|
USH2187H1 true USH2187H1 (en) | 2007-04-03 |
Family
ID=37897762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/186,049 Abandoned USH2187H1 (en) | 2002-06-28 | 2002-06-28 | System and method for gender identification in a speech application environment |
Country Status (1)
Country | Link |
---|---|
US (1) | USH2187H1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050261901A1 (en) * | 2004-05-19 | 2005-11-24 | International Business Machines Corporation | Training speaker-dependent, phrase-based speech grammars using an unsupervised automated technique |
US7475344B1 (en) | 2008-05-04 | 2009-01-06 | International Business Machines Corporation | Genders-usage assistant for composition of electronic documents, emails, or letters |
EP2028646A1 (en) * | 2007-08-17 | 2009-02-25 | Envox International Limited | Device for modifying and improving the behaviour of speech recognition systems |
US20090171663A1 (en) * | 2008-01-02 | 2009-07-02 | International Business Machines Corporation | Reducing a size of a compiled speech recognition grammar |
US7657433B1 (en) * | 2006-09-08 | 2010-02-02 | Tellme Networks, Inc. | Speech recognition accuracy with multi-confidence thresholds |
US7792257B1 (en) | 2005-01-20 | 2010-09-07 | Andre Denis Vanier | Method and system for determining gender and targeting advertising in a telephone system |
US20110077944A1 (en) * | 2009-09-28 | 2011-03-31 | Broadcom Corporation | Speech recognition module and applications thereof |
US8494857B2 (en) | 2009-01-06 | 2013-07-23 | Regents Of The University Of Minnesota | Automatic measurement of speech fluency |
US20130204607A1 (en) * | 2011-12-08 | 2013-08-08 | Forrest S. Baker III Trust | Voice Detection For Automated Communication System |
US8515047B1 (en) | 2005-04-20 | 2013-08-20 | Grape Technology Group, Inc. | Method and system for prioritizing the presentation of information within a directory assistance context wireless and landline telephone systems |
US20140081636A1 (en) * | 2012-09-15 | 2014-03-20 | Avaya Inc. | System and method for dynamic asr based on social media |
US8804696B1 (en) * | 2006-11-07 | 2014-08-12 | At&T Intellectual Property Ii, L.P. | Integrated gateway |
US9213692B2 (en) * | 2004-04-16 | 2015-12-15 | At&T Intellectual Property Ii, L.P. | System and method for the automatic validation of dialog run time systems |
US9576593B2 (en) | 2012-03-15 | 2017-02-21 | Regents Of The University Of Minnesota | Automated verbal fluency assessment |
CN107832304A (en) * | 2017-11-23 | 2018-03-23 | 珠海金山网络游戏科技有限公司 | A kind of method and system that user's sex is judged based on Message-text |
CN110100276A (en) * | 2016-12-22 | 2019-08-06 | 大众汽车有限公司 | The voice output sound of voice operating system |
US10747957B2 (en) * | 2018-11-13 | 2020-08-18 | Asapp, Inc. | Processing communications using a prototype classifier |
US11386259B2 (en) | 2018-04-27 | 2022-07-12 | Asapp, Inc. | Removing personal information from text using multiple levels of redaction |
US11615422B2 (en) | 2016-07-08 | 2023-03-28 | Asapp, Inc. | Automatically suggesting completions of text |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715396A (en) | 1992-10-13 | 1998-02-03 | Bay Networks, Inc. | Method for providing for automatic topology discovery in an ATM network or the like |
US5797122A (en) | 1995-03-20 | 1998-08-18 | International Business Machines Corporation | Method and system using separate context and constituent probabilities for speech recognition in languages with compound words |
US5870709A (en) | 1995-12-04 | 1999-02-09 | Ordinate Corporation | Method and apparatus for combining information from speech signals for adaptive interaction in teaching and testing |
US5953701A (en) | 1998-01-22 | 1999-09-14 | International Business Machines Corporation | Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence |
JPH11296193A (en) * | 1998-04-06 | 1999-10-29 | Casio Comput Co Ltd | Voice synthesizer |
US6058166A (en) | 1997-10-06 | 2000-05-02 | Unisys Corporation | Enhanced multi-lingual prompt management in a voice messaging system with support for speech recognition |
US6122615A (en) * | 1997-11-19 | 2000-09-19 | Fujitsu Limited | Speech recognizer using speaker categorization for automatic reevaluation of previously-recognized speech data |
US6157913A (en) * | 1996-11-25 | 2000-12-05 | Bernstein; Jared C. | Method and apparatus for estimating fitness to perform tasks based on linguistic and other aspects of spoken responses in constrained interactions |
US6233317B1 (en) | 1997-12-11 | 2001-05-15 | Unisys Corporation | Multiple language electronic mail notification of received voice and/or fax messages |
US6493671B1 (en) * | 1998-10-02 | 2002-12-10 | Motorola, Inc. | Markup language for interactive services to notify a user of an event and methods thereof |
US6684194B1 (en) * | 1998-12-03 | 2004-01-27 | Expanse Network, Inc. | Subscriber identification system |
US6931375B1 (en) * | 1997-05-27 | 2005-08-16 | Sbc Properties, Lp | Speaker verification method |
-
2002
- 2002-06-28 US US10/186,049 patent/USH2187H1/en not_active Abandoned
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715396A (en) | 1992-10-13 | 1998-02-03 | Bay Networks, Inc. | Method for providing for automatic topology discovery in an ATM network or the like |
US5797122A (en) | 1995-03-20 | 1998-08-18 | International Business Machines Corporation | Method and system using separate context and constituent probabilities for speech recognition in languages with compound words |
US5870709A (en) | 1995-12-04 | 1999-02-09 | Ordinate Corporation | Method and apparatus for combining information from speech signals for adaptive interaction in teaching and testing |
US6157913A (en) * | 1996-11-25 | 2000-12-05 | Bernstein; Jared C. | Method and apparatus for estimating fitness to perform tasks based on linguistic and other aspects of spoken responses in constrained interactions |
US6931375B1 (en) * | 1997-05-27 | 2005-08-16 | Sbc Properties, Lp | Speaker verification method |
US6058166A (en) | 1997-10-06 | 2000-05-02 | Unisys Corporation | Enhanced multi-lingual prompt management in a voice messaging system with support for speech recognition |
US6122615A (en) * | 1997-11-19 | 2000-09-19 | Fujitsu Limited | Speech recognizer using speaker categorization for automatic reevaluation of previously-recognized speech data |
US6233317B1 (en) | 1997-12-11 | 2001-05-15 | Unisys Corporation | Multiple language electronic mail notification of received voice and/or fax messages |
US5953701A (en) | 1998-01-22 | 1999-09-14 | International Business Machines Corporation | Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence |
JPH11296193A (en) * | 1998-04-06 | 1999-10-29 | Casio Comput Co Ltd | Voice synthesizer |
US6493671B1 (en) * | 1998-10-02 | 2002-12-10 | Motorola, Inc. | Markup language for interactive services to notify a user of an event and methods thereof |
US6684194B1 (en) * | 1998-12-03 | 2004-01-27 | Expanse Network, Inc. | Subscriber identification system |
Non-Patent Citations (1)
Title |
---|
Boroditsky, L. & Schmidt, L.A., "Sex, Syntax, and Semantics", 2000, Proceedings of the 22nd Annual Meeting of the Cognitive Science Society, pp. 1-6. * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9584662B2 (en) * | 2004-04-16 | 2017-02-28 | At&T Intellectual Property Ii, L.P. | System and method for the automatic validation of dialog run time systems |
US9213692B2 (en) * | 2004-04-16 | 2015-12-15 | At&T Intellectual Property Ii, L.P. | System and method for the automatic validation of dialog run time systems |
US20050261901A1 (en) * | 2004-05-19 | 2005-11-24 | International Business Machines Corporation | Training speaker-dependent, phrase-based speech grammars using an unsupervised automated technique |
US7778830B2 (en) * | 2004-05-19 | 2010-08-17 | International Business Machines Corporation | Training speaker-dependent, phrase-based speech grammars using an unsupervised automated technique |
US7826605B1 (en) | 2005-01-20 | 2010-11-02 | Andre Denis Vanier | Method and system for integrating information from wireless and landline telephone systems |
US8553850B2 (en) | 2005-01-20 | 2013-10-08 | Grape Technology Group, Inc. | Method and system for providing information and advertising content in a telephone system |
US7933388B1 (en) | 2005-01-20 | 2011-04-26 | Andre Denis Vanier | Method and system for providing information and advertising content in a telephone system |
US7792257B1 (en) | 2005-01-20 | 2010-09-07 | Andre Denis Vanier | Method and system for determining gender and targeting advertising in a telephone system |
US8515047B1 (en) | 2005-04-20 | 2013-08-20 | Grape Technology Group, Inc. | Method and system for prioritizing the presentation of information within a directory assistance context wireless and landline telephone systems |
US7657433B1 (en) * | 2006-09-08 | 2010-02-02 | Tellme Networks, Inc. | Speech recognition accuracy with multi-confidence thresholds |
US8804696B1 (en) * | 2006-11-07 | 2014-08-12 | At&T Intellectual Property Ii, L.P. | Integrated gateway |
US20090086934A1 (en) * | 2007-08-17 | 2009-04-02 | Fluency Voice Limited | Device for Modifying and Improving the Behaviour of Speech Recognition Systems |
EP2028646A1 (en) * | 2007-08-17 | 2009-02-25 | Envox International Limited | Device for modifying and improving the behaviour of speech recognition systems |
US20090171663A1 (en) * | 2008-01-02 | 2009-07-02 | International Business Machines Corporation | Reducing a size of a compiled speech recognition grammar |
US7475344B1 (en) | 2008-05-04 | 2009-01-06 | International Business Machines Corporation | Genders-usage assistant for composition of electronic documents, emails, or letters |
US9230539B2 (en) | 2009-01-06 | 2016-01-05 | Regents Of The University Of Minnesota | Automatic measurement of speech fluency |
US8494857B2 (en) | 2009-01-06 | 2013-07-23 | Regents Of The University Of Minnesota | Automatic measurement of speech fluency |
US8392189B2 (en) * | 2009-09-28 | 2013-03-05 | Broadcom Corporation | Speech recognition using speech characteristic probabilities |
US20110077944A1 (en) * | 2009-09-28 | 2011-03-31 | Broadcom Corporation | Speech recognition module and applications thereof |
US20130204607A1 (en) * | 2011-12-08 | 2013-08-08 | Forrest S. Baker III Trust | Voice Detection For Automated Communication System |
US9583108B2 (en) * | 2011-12-08 | 2017-02-28 | Forrest S. Baker III Trust | Voice detection for automated communication system |
US9576593B2 (en) | 2012-03-15 | 2017-02-21 | Regents Of The University Of Minnesota | Automated verbal fluency assessment |
US20140081636A1 (en) * | 2012-09-15 | 2014-03-20 | Avaya Inc. | System and method for dynamic asr based on social media |
US9646604B2 (en) * | 2012-09-15 | 2017-05-09 | Avaya Inc. | System and method for dynamic ASR based on social media |
US10134391B2 (en) | 2012-09-15 | 2018-11-20 | Avaya Inc. | System and method for dynamic ASR based on social media |
US11615422B2 (en) | 2016-07-08 | 2023-03-28 | Asapp, Inc. | Automatically suggesting completions of text |
CN110100276A (en) * | 2016-12-22 | 2019-08-06 | 大众汽车有限公司 | The voice output sound of voice operating system |
CN107832304A (en) * | 2017-11-23 | 2018-03-23 | 珠海金山网络游戏科技有限公司 | A kind of method and system that user's sex is judged based on Message-text |
US11386259B2 (en) | 2018-04-27 | 2022-07-12 | Asapp, Inc. | Removing personal information from text using multiple levels of redaction |
US10747957B2 (en) * | 2018-11-13 | 2020-08-18 | Asapp, Inc. | Processing communications using a prototype classifier |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USH2187H1 (en) | System and method for gender identification in a speech application environment | |
US7505906B2 (en) | System and method for augmenting spoken language understanding by correcting common errors in linguistic performance | |
US7228275B1 (en) | Speech recognition system having multiple speech recognizers | |
US9502024B2 (en) | Methods, apparatus and computer programs for automatic speech recognition | |
US7200555B1 (en) | Speech recognition correction for devices having limited or no display | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
US6694296B1 (en) | Method and apparatus for the recognition of spelled spoken words | |
US6173266B1 (en) | System and method for developing interactive speech applications | |
US6839667B2 (en) | Method of speech recognition by presenting N-best word candidates | |
US6925154B2 (en) | Methods and apparatus for conversational name dialing systems | |
US8024179B2 (en) | System and method for improving interaction with a user through a dynamically alterable spoken dialog system | |
US20070239455A1 (en) | Method and system for managing pronunciation dictionaries in a speech application | |
JP4914295B2 (en) | Force voice detector | |
US20090157405A1 (en) | Using partial information to improve dialog in automatic speech recognition systems | |
EP1528539A1 (en) | A system and method of using Meta-Data in language modeling | |
JPH07210190A (en) | Method and system for voice recognition | |
WO2008084476A2 (en) | Vowel recognition system and method in speech to text applications | |
KR101836430B1 (en) | Voice recognition and translation method and, apparatus and server therefor | |
US20060085192A1 (en) | System and methods for conducting an interactive dialog via a speech-based user interface | |
US7162422B1 (en) | Apparatus and method for using user context information to improve N-best processing in the presence of speech recognition uncertainty | |
US20040019488A1 (en) | Email address recognition using personal information | |
US7302381B2 (en) | Specifying arbitrary words in rule-based grammars | |
US20040006469A1 (en) | Apparatus and method for updating lexicon | |
Rabiner et al. | Statistical methods for the recognition and understanding of speech | |
Kamm et al. | Design issues for interfaces using voice input |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YUCHIMIUK, JOHN J.;REEL/FRAME:013085/0186 Effective date: 20020812 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YUCHIMIUK, JOHN J.;REEL/FRAME:013936/0142 Effective date: 20030610 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNORS:UNISYS CORPORATION;UNISYS HOLDING CORPORATION;REEL/FRAME:018003/0001 Effective date: 20060531 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023312/0044 Effective date: 20090601 Owner name: UNISYS HOLDING CORPORATION, DELAWARE Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023312/0044 Effective date: 20090601 |
|
AS | Assignment |
Owner name: DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERA Free format text: PATENT SECURITY AGREEMENT (PRIORITY LIEN);ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:023355/0001 Effective date: 20090731 |
|
AS | Assignment |
Owner name: DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERA Free format text: PATENT SECURITY AGREEMENT (JUNIOR LIEN);ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:023364/0098 Effective date: 20090731 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:DEUTSCHE BANK TRUST COMPANY;REEL/FRAME:030004/0619 Effective date: 20121127 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERAL TRUSTEE;REEL/FRAME:030082/0545 Effective date: 20121127 |
|
AS | Assignment |
Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS COLLATE Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:042354/0001 Effective date: 20170417 Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS COLLATERAL TRUSTEE, NEW YORK Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:042354/0001 Effective date: 20170417 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION (SUCCESSOR TO GENERAL ELECTRIC CAPITAL CORPORATION);REEL/FRAME:044416/0358 Effective date: 20171005 |