CA2588604A1

CA2588604A1 - A system and method for improving recognition accuracy in speech recognition applications

Info

Publication number: CA2588604A1
Application number: CA002588604A
Authority: CA
Inventors: Robert E. Shostak
Original assignee: Individual
Current assignee: Stryker Corp
Priority date: 2004-11-30
Filing date: 2005-11-29
Publication date: 2006-06-08
Anticipated expiration: 2025-11-29
Also published as: US20090043587A1; EP1820182A4; US7457751B2; EP1820182A2; US8175887B2; CA2588604C; WO2006060443A3; WO2006060443A2; US20060116885A1

Abstract

A speech recognition system and method are provided to correctly distinguish among muliple interpretations of an utterance (202). This system is particularly useful when the set of possible interpretations is large, changes dynamically, and/or contains items that are not phonetically distinctive (203). The speech recogntion system extends the capabilities of mobile wireless communication devices that are voice operated after their intital activation (206).

Claims

1. A wireless communications system, comprising:
a controlling computer;
one or more wireless access points connected to the controlling computer by a network;
a badge that communicates using a wireless protocol with one of the wireless access points; and wherein the controlling computer further comprises a speech recognition system that receives a voice command from a particular user through the badge and interprets the voice command of the user to generate a set of voice commands interpretations, the speech recognition system further comprising an inner circle mechanism having an inner circle store containing a list of entities associated with the particular user, the inner circle mechanism improving the accuracy of the set of voice commands interpretations to generate a best fit voice command corresponding to the received voice command of the user.

2. The system of Claim 1, wherein the inner circle store further comprises one or more entities which are capable of being arguments to a voice command of the user, the entities further comprise a name of one or more of an individual person, a group of people, an organization and a group of organizations.

3. The system of Claim 1, wherein each voice command further comprises a verb portion and wherein the inner circle mechanism further comprises a partitioner that separates the set of the voice command interpretations into one or more isomorphic sets, each isomorphic set containing one or more voice command interpretations having a common verb portion and a filter that filters the one or more voice command interpretations in each isomorphic set to generate a preferred voice command interpretation for each isomorphic set.

4. The system of Claim 3, wherein the voice command further comprises an argument portion.

5. The system of Claim 4, wherein the filter further comprises a comparator that compares the argument of each voice command interpretation in each isomorphic set to the inner circle store and removes a particular voice command interpretation when the argument of the particular voice command interpretation does not match the inner circle store, unless none of the interpretations match the inner circle store, in which case none of the interpretations are removed.

6. The system of Claim 1, wherein the speech recognition system further comprises a say or spell mechanism that permits the user to spell a voice command, making it sufficiently phonetically distinct from other voice commands allowed in the grammar, in order to reduce the chance of an incorrect voice command interpretation.

7. The system of Claim 6, wherein each voice command further comprises a verb portion and an argument portion and wherein the say or spell mechanism further comprises a grammar store, the grammar store having for each verb of each voice command, a spelling of the verb of the voice command and for each argument of each voice command, a spelling of the argument so that the grammar store contains the combination of each spoken or spelled verb and each spoken or spelled argument for the voice commands.

8. The system of Claim 7, wherein the speech recognition system further comprises a mechanism that permits the user to spell a voice command verb and argument using the grammar store of the say or spell mechanism.

9. A wireless communications system, comprising:
a controlling computer;
one or more wireless access points connected to the controlling computer by a network;
a badge that communicates using a wireless protocol with one of the wireless access points; and wherein the controlling computer further comprises a speech recognition system that receives a voice command from a particular user through the badge and interprets the voice command of the user to generate a set of resulting voice command interpretations, the speech recognition system further comprising a say or spell mechanism that permits the user to spell a voice command, making it sufficiently phonetically distinct from other voice commands allowed in the grammar, in order to reduce the chance of an incorrect voice command interpretation.

10. The system of Claim 9, wherein each voice command further comprises a verb portion and an argument portion and wherein the say or spell mechanism further comprises a grammar store, the grammar store having for each verb of each voice command, a spelling of the verb of the voice command and for each argument of each voice command, a spelling of the argument so that the grammar store contains the combination of each verb and each argument for the voice commands.

11. The system of Claim 10, wherein the speech recognition system further comprises a spelling mechanism that permits the user to spell a voice command verb and argument using the grammar store of the say or spell mechanism.

12. A method for improving the accuracy of a computer-implemented speech recognition system, the method comprising:
obtaining a set of voice command interpretations from a speech recognition engine, each voice command interpretation having a verb portion, partitioning the set of voice command interpretations into one or more isomorphic sets, each isomorphic set containing one or more voice command interpretations having a common verb portion, filtering the one or more voice command interpretations in each isomorphic set to generate a preferred voice command interpretation; and outputting the preferred voice command interpretation.

13 The method of Claim 12, wherein the voice command interpretation further comprises an argument portion.

14 The method of Claim 13, wherein the filtering the one or more voice command interpretations in each isomorphic set further comprises comparing the argument of each voice command interpretation in each isomorphic set to a store of arguments and removing a particular voice command interpretation when the argument of the particular voice command interpretation does not match an entity in the store of arguments, unless none of the interpretations match the inner circle store, in which case none of the interpretations are removed.

15. The method of Claim 14, wherein the store of arguments further comprises an entity store for a particular user from which a voice command has been issued, the store of entities further comprises a name of one or more of an individual person, a group of people, an organization and a group of organizations.

16. The method of Claim 15 further comprising generating an entity store for each user.

17. The method of Claim 16, wherein generating the entity store further comprises one or more of manually setting an entity store for a user, setting an entity store of the user based on a buddy list of the user, setting an entity store of the user based on the department of the user and automatically setting an entity store for the user based on the arguments of prior voice commands issued by the user.

18. The method of Claim 16 further comprising removing an entity from the entity store for the user.

19. The method of Claim 18, wherein removing an entity from the entity store further comprises one or more of manually removing an entity from the entity store and automatically removing an entity from the entity store when the entity has not appeared as an argument for a voice command of the user for a predetermined time period.

20. The method of Claim 13, wherein obtaining the set of voice command interpretations further comprises spelling the verb portion of the voice command to distinguish similar voice command interpretations and generate a set of voice command interpretations and spelling the argument portion of the voice command to further distinguish similar voice command interpretations to generate a reduced set of voice command interpretations.

21. The method of Claim 20, wherein obtaining the set of voice command interpretations further comprises storing a grammar store wherein the grammar store further comprises, for each verb of each voice command, a spelling of the verb of the voice command and for each argument of each voice command, a spelling of the argument so that the grammar store contains the combination of each verb and each argument for the voice commands.

22. The method of Claim 21, wherein the spelling steps further comprises comparing a spelled voice command to the grammar store.

23. A method for improving the accuracy of a computer-implemented speech recognition system, the method comprising:
receiving a voice command from a user, the voice command having a verb portion and an argument portion;
spelling the verb portion of the voice command to distinguish similar voice command interpretations and generate a set of voice command interpretations; and spelling the argument portion of the voice command to further distinguish similar voice command interpretations to generate a reduced set of voice command interpretations.

24. The method of Claim 23 further comprising storing a grammar store wherein the grammar store further comprises, for each verb of each voice command, a spelling of the verb of the voice command and for each argument of each voice command, a spelling of the argument so that the grammar store contains the combination of each verb and each argument for the voice commands.

25. The method of Claim 24, wherein the spelling steps further comprises comparing a spelled voice command to the grammar store.

26. A computer implemented speech recognition system, comprising:
a speech recognition engine that generates a set of voice command interpretations based on a voice command of a user; and an inner circle mechanism, connected to the speech recognition engine, the inner circle mechanism having an inner circle store containing a list of entities associated with the user, the inner circle mechanism improving the accuracy of the set of voice commands interpretations to generate a best fit voice command corresponding to the voice command of the user.

27. The system of Claim 26, wherein the inner circle store further comprises one or more entities which are arguments to a voice command of the user, the entities further comprise a name of one or more of an individual person, a group of people, an organization and a group of organizations.

28 The system of Claim 27, wherein each voice command further comprises a verb portion, the inner circle mechanism further comprises a partitioner that separates the set of the voice command interpretations into one or more isomorphic sets, each isomorphic set containing one or more voice command interpretations having a common verb portion and a filter that filters the one or more voice command interpretations in each isomorphic set to generate one or more preferred voice command interpretations for each isomorphic set.

29. The system of Claim 28, wherein the voice command further comprises an argument portion.

30. The system of Claim 29, wherein the filter further comprises a comparator that compares the argument of each voice command interpretation in each isomorphic set to the inner circle store and removes a particular voice command interpretation when the argument of the particular voice command interpretation does not match the entities in the inner circle store.

31. The system of Claim 26, wherein the speech recognition engine further comprises a say or spell mechanism that permits the user to spell a voice command, making it sufficiently phonetically distinct from other voice commands allowed in the grammar, in order to reduce the chance of incorrect voice command interpretations.

32. The system of Claim 31, wherein each voice command further comprises a verb portion and an argument portion and wherein the say or spell mechanism further comprises a grammar store, the grammar store having for each verb of each voice command, a spelling of the verb of the voice command and for each argument of each voice command, a spelling of the argument so that the grammar store contains the combination of each verb and each argument for the voice commands.

33. The system of Claim 32, wherein the speech recognition system further comprises a spelling mechanism that permits the user to spell a voice command verb and argument using the grammar store of the say or spell mechanism.

34. A computer implemented speech recognition system, comprising:
a speech recognition engine that generates a set of voice command interpretations based on a voice command of a user; and the speech recognition engine further comprises a say or spell mechanism that permits the user to spell a voice command, making it sufficiently phonetically distinct from other voice commands allowed in the grammar, in order to reduce the chance of incorrect voice command interpretations.

35. The system of Claim 34, wherein each voice command further comprises a verb portion and an argument portion and wherein the say or spell mechanism further comprises a grammar store, the grammar store having for each verb of each voice command, a spelling of the verb of the voice command and for each argument of each voice command, a spelling of the argument so that the grammar store contains the combination of each verb and each argument for the voice commands.

36 The system of Claim 35, wherein the speech recognition system further comprises a spelling mechanism that permits the user to spell a voice command verb and argument using the grammar store of the say or spell mechanism.