US20010056344A1 - Command boundary identifier for conversational natural language - Google Patents

Command boundary identifier for conversational natural language Download PDF

Info

Publication number
US20010056344A1
US20010056344A1 US09/181,322 US18132298A US2001056344A1 US 20010056344 A1 US20010056344 A1 US 20010056344A1 US 18132298 A US18132298 A US 18132298A US 2001056344 A1 US2001056344 A1 US 2001056344A1
Authority
US
United States
Prior art keywords
command
recognized text
recited
boundary
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/181,322
Other versions
US6453292B2 (en
Inventor
Ganesh N. Ramaswamy
Jan Kleindienst
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US09/181,322 priority Critical patent/US6453292B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KLEINDIENST, JAN, RAMASWAMY, GANESH N.
Priority to JP23021799A priority patent/JP3476006B2/en
Priority to GB9921422A priority patent/GB2343285B/en
Priority to CN99121518.4A priority patent/CN1125436C/en
Publication of US20010056344A1 publication Critical patent/US20010056344A1/en
Application granted granted Critical
Publication of US6453292B2 publication Critical patent/US6453292B2/en
Assigned to NUANCE COMMUNICATIONS, INC. reassignment NUANCE COMMUNICATIONS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present invention relates to speech recognition and, more particularly, to an apparatus and method for identifying command boundaries from natural conversational speech.
  • Natural language user interface systems includes systems which permit a speaker to input commands to the system by saying the commands.
  • state-of-the-art conversational natural language user interface systems typically require the user to indicate the end of a command, or the command boundary, through some form of manual input, such as pausing between commands or clicking a microphone control button on the display. Such a requirement makes the user interface quite cumbersome to use and may result in unwanted delays.
  • An apparatus for automatically identifying command boundaries in a conversational natural language system includes a speech recognizer for converting an input signal to recognized text and a boundary identifier coupled to the speech recognizer for receiving the recognized text and determining if a command is present in the recognized text, the boundary identifier outputting the command if present in the recognized text.
  • the boundary identifier may output to an application which executes the command.
  • the boundary identifier may include an input processor for processing the recognized text.
  • the input processor may process the recognized text by augmenting each word in the recognized text by the word's relative position with respect to a hypothesized command boundary.
  • the boundary identifier may further include a feature detector coupled to the input processor, the feature detector for determining which feature functions, from a set of feature functions, are present in the processed recognized text.
  • the boundary identifier may further include a decision maker for determining if a command is present in the processed recognized text according to a set of feature weights corresponding to the feature functions in the processed recognized text. The decision maker may be coupled to the feature detector and may decide if the processed recognized text includes a command boundary.
  • a training system for training the apparatus to recognize text and to recognize complete commands may be included.
  • the training system may include an input processor for processing a collection of training data comprising utterances which include complete commands and other than complete commands.
  • the input processor may insert a token before each utterance in the training data.
  • the input processor may insert a token before a first utterance in the recognized text, and after every command in the recognized text.
  • a feature extractor may be included for extracting feature functions including words and relative positions of the words with respect to a hypothesized command boundary location.
  • the speech recognizer may include a language model that has been trained using training data, the training data including a token inserted to indicate a location of a command boundary in the training data.
  • the speech recognizer may include additional baseforms for the token.
  • the speech recognizer may produce the recognized text including the token.
  • the boundary identifier may declare a command boundary when there is an extended period of silence in the recognized text.
  • a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for identifying commands in recognized text, the method steps include inputting recognized text, processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary, determining feature functions in the processed recognized text in accordance with a set of feature functions, deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions and if a command is included, outputting the command.
  • a program of instructions for training the program storage device by inputting training data including utterances comprising commands and other than commands may be included.
  • the steps of placing a token before each utterance may be included.
  • the step of placing a token after each command boundary included in the utterances may also be included.
  • the program of instructions for training the program storage device may includes the step of extracting feature functions from the training data.
  • the program of instructions for training the program storage device may include the step of determining feature weights for all feature functions.
  • the program of instructions for processing the recognized text may include the step of placing a token before a first utterance in the recognized text and after each command in the recognized text.
  • the program storage device may further include a speech recognizer for providing the recognized text.
  • a method for identifying commands in natural conversational language includes the steps of inputting recognized text, processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary, determining feature functions in the processed recognized text in accordance with a set of feature functions, deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions and if a command is included, outputting the command.
  • the step of inputting training data including utterances comprising commands and other than commands may be included.
  • the steps of placing a token before each utterance of the training data may also be included.
  • the method may further include the step of placing a token after command boundaries included in the utterances.
  • the method may include the step of extracting feature functions from the training data.
  • the method may further includes the step of determining feature weights for all feature functions.
  • the step of placing a token before a first utterance in the recognized text and after each command in the recognized text may be included.
  • the step of outputting the command to a device for executing the command includes a speech recognizer for providing the recognized text may also be included.
  • FIG. 1 is a block/flow diagram of a system/method of which includes a boundary identifier, according to the present invention
  • FIG. 2 is a block/flow diagram of an application that uses complete commands generated by a boundary identifier, according to the present invention
  • FIG. 3 is a block/flow diagram of a boundary identifier, according to the present invention.
  • FIG. 4 is a block/flow diagram of an apparatus that generates feature functions and feature weights used by the boundary identifier, according to the present invention.
  • FIG. 5 is a block diagram of a speech recognizer that generates recognized text to be used by the boundary identifier, according to the present invention.
  • the present invention relates to speech recognition and, more particularly, to an apparatus and method for identifying command boundaries from natural conversational speech.
  • the present invention includes a trainable system which automatically identify commands words or phrases from conversational natural language.
  • the invention provides a more user friendly interface which permits a user to speak more naturally and continually without manual indication of command boundaries.
  • a maximum entropy identification model is preferably used that has all correct command boundaries marked.
  • a set of features and their weights are iteratively selected using the training data.
  • the features include words and phrases, as well as their relative positions to potential command boundaries of the speech.
  • Alternate embodiments of the present invention include a more effective language model to used to generate additional useful tokens for the identification model.
  • the present invention provides an apparatus that can automatically identify the command boundaries in a conversational natural language user interface.
  • the present invention is trainable with additional data to improve performance, or with data from a new domain to allow the use of the apparatus in a new domain.
  • the present invention may also identify and separate multiple commands included in a single utterance.
  • the present invention uses minimal computational resources during identification to allow its use in a real-time system.
  • the present invention uses statistical techniques both from natural language understanding and from speech recognition.
  • training data is first marked with the command boundaries. For each command boundary, all the surrounding words within a window (including words which are both to the left and to the right of the boundary) are marked to indicate their relative position with respect to the boundary.
  • the training data which is thus processed is then subjected to maximum entropy style feature extraction, with the features including words and phrases, as well as their relative position to the boundary.
  • the corresponding weights for the features are estimated using an iterative algorithm.
  • the test sentences are processed similarly to mark the relative position of each of the words in the current string, with respect to a hypothesized location of the command boundary. When possible, words occurring after the hypothesized location of the boundary are also marked. Then, the decision of whether or not to declare the hypothesized location as a command boundary is made by examining the product of the weights for the features that are present.
  • the present invention also includes ways to strengthen the maximum entropy identification model.
  • One such enhancement includes using a more effective language model at the speech recognition stage. All the command boundaries in the language model training data are advantageously marked with a token, and an additional set of baseforms for the boundary (most of the baseforms corresponding to various forms of silence) are included in the model. With this addition, the speech recognition engine produces a string of text with additional tokens to suggest potential command boundaries.
  • Other enhancements to the identification model such as taking advantage of extended periods of silence, are also described.
  • FIGS. 1 - 5 may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in software on one or more appropriately programmed general purpose digital computers having a processor and memory and input/output interfaces. Referring now to the drawings in which like numerals represent the same or similar elements and initially to FIG. 1, a flow/block diagram is shown of an example of a system 8 that includes a boundary identifier, according to the present invention.
  • An audio input 10 is generated by a user of system 8 , and is in the form of a spoken command issued to system 8 .
  • Audio input 10 is converted to recognized text 30 , by a speech recognizer 20 .
  • the construction of speech recognizer 20 is known to those skilled in the art. Recognized text 30 is an input to a boundary identifier 40 , which generates a complete command 50 as output. If recognized text 30 is a complete command, recognized text 30 is sent as the output. If recognized text 30 is not a complete command, then no output is sent.
  • Complete command 50 is used by an application 60 .
  • Application 60 is preferably a software application and complete commands 50 may be used open the software application and otherwise interface therewith.
  • the present invention finds utility in many applications, for example, system 8 may interface with mechanical equipment or electronic devices. System 8 can transfer verbal commands or audio signals into executable signals to, for example, turn on/off an appliance or adjust features or functions of the equipment/devices.
  • Application 60 preferably includes a natural language understanding system 61 and a command executor 62 .
  • the information included in complete command 50 is analyzed and interpreted by natural language understanding system 61 to generate a formal command, and the formal command is executed by command executor 61 .
  • the natural language understanding system may translate this into a command such as CheckNewMessage (), and submit the command to command executor 62 .
  • Boundary identifier 40 takes recognized text 30 as an input and produces complete command 50 as the output.
  • Boundary identifier 40 includes feature functions 41 , feature weights 42 , a feature detector 43 , an input processor 44 and a decision maker 45 .
  • Boundary identifier 40 is responsible for evaluating the conditional probability P(T
  • Boundary identifier 40 needs a model built from training data that can generate the values P(T
  • the present invention preferably generates the values P(T
  • Other components such as feature detector 43 , input processor 44 and decision maker 45 , as well as feature functions 41 and feature weights 42 will be described in greater detail below.
  • Input processor 44 processes training data 70 . For every possible position of the command boundary, also known as the hypothesized location of the command boundary, input processor 44 augments each word in the training set with ⁇ n if the word is n positions to the left of the hypothesized command boundary, and with +n if the word is n positions to the right of the hypothesized command boundary. After the processing by input processor 44 , the entries in the processed training set will look like:
  • input processor 44 processes the training data 70 , and the processed training data is used by feature extractor 46 to produce feature functions 41 .
  • the feature functions include one or more words from the processed training data, along with the correct decision. For example, consider the feature
  • the total number of features, n is a parameter of the invention, and its value depends on the application.
  • the selection of feature functions 41 from training data 70 processed by the input processor is known in the art, and may be done as described in Papineni et al., “Feature-Based Language Understanding,” EUROSPEECH, Rhodes, Greece, 1997, incorporated herein by reference.
  • a feature weight calculator 47 calculates features weights 42 , including weight ⁇ i for feature function f i t,s for all n feature functions.
  • an Improved Iterative Scaling algorithm described in S. Della Pietra et al.,“Inducing Features of Random Fields,” Technical Report CMU-CS95-144, School of Computer Science, Carnegie Mellon University, 1995, is used and incorporated herein by reference.
  • input processor 44 augments the relative position of each word in the utterance relative to a given hypothesized command boundary location, and repeats this augmentation for all possible command boundary locations.
  • Feature detector 43 determines which feature functions 41 are present in a given processed utterance, and decision maker 45 makes the final decision as to whether or not the given processed utterance is a complete command.
  • One embodiment of the present invention may be used to improve the performance of the invention.
  • This embodiment includes using a new token to indicate the beginning of the utterance.
  • a token of, for example, “SB” to indicate the beginning of the utterance
  • the entries in the processed training set may look like the following:
  • every utterance in training data 70 may include the SB token at the beginning of each utterance.
  • the SB token may be inserted before the first utterance, and for subsequent utterances, it is preferably inserted after every declared command boundary.
  • Other tokens and placements thereof are contemplated as well by the present invention.
  • Speech recognizer 20 includes a language model 21 and other components 22 .
  • language model 21 is enhanced so that speech recognizer 20 can produce recognized text 30 that also includes a new token, for example, SE that suggests a possible location for the command boundary.
  • speech recognizer 20 produces recognized text 30 that includes utterances, for example, “check new mail SE show me the first message SE . . . ”.
  • language model 21 is preferably built using data that has the SE tokens inserted at the end of each complete command.
  • the language model may be built using procedures described in the F. Jelinek, incorporated by reference above.
  • acoustic baseforms for this token are added to speech recognizer 20 .
  • Acoustic baseforms for this token corresponding to various forms of silences, are added to the model.
  • the following acoustic baseforms are used for the SE token:
  • training data 70 is first subjected to speech recognizer 20 to produce the SE tokens, and input processor 44 generates processed data that may look like
  • decision maker 45 in FIG. 3 may declare a command boundary if the condition specified by EQ. 4 is satisfied, or if there is an extended period of silence between utterances. In one embodiment of the invention, if there is silence for 3 seconds or more, for example, decision maker 45 declares a command boundary. In another embodiment of the invention, the user may choose the desirable length of the silence by means of options provided by the interface to system 8 .
  • the invention described herein for identifying the command boundaries may also be used to recognize the presence of multiple commands in the same utterance.
  • a command boundary may be placed after each portion of the utterance corresponding to a complete command, thus decomposing the input utterance into multiple commands. For example, if the sentence “check for new mail show me the first one” is input the output could be:

Abstract

An apparatus for automatically identifying command boundaries in a conversational natural language system, in accordance with the present invention, includes a speech recognizer for converting an input signal to recognized text and a boundary identifier coupled to the speech recognizer for receiving the recognized text and determining if a command is present in the recognized text, the boundary identifier outputting the command if present in the recognized text. A method for identifying command boundaries in a conversational natural language system is also included.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0001]
  • The present invention relates to speech recognition and, more particularly, to an apparatus and method for identifying command boundaries from natural conversational speech. [0002]
  • 2. Description of the Related Art [0003]
  • Natural language user interface systems includes systems which permit a speaker to input commands to the system by saying the commands. However, state-of-the-art conversational natural language user interface systems typically require the user to indicate the end of a command, or the command boundary, through some form of manual input, such as pausing between commands or clicking a microphone control button on the display. Such a requirement makes the user interface quite cumbersome to use and may result in unwanted delays. [0004]
  • Therefore, a need exists for a trainable system that can automatically identify command boundaries in a conversational natural language user interface. [0005]
  • SUMMARY OF THE INVENTION
  • An apparatus for automatically identifying command boundaries in a conversational natural language system, in accordance with the present invention, includes a speech recognizer for converting an input signal to recognized text and a boundary identifier coupled to the speech recognizer for receiving the recognized text and determining if a command is present in the recognized text, the boundary identifier outputting the command if present in the recognized text. [0006]
  • In alternate embodiments, the boundary identifier may output to an application which executes the command. The boundary identifier may include an input processor for processing the recognized text. The input processor may process the recognized text by augmenting each word in the recognized text by the word's relative position with respect to a hypothesized command boundary. The boundary identifier may further include a feature detector coupled to the input processor, the feature detector for determining which feature functions, from a set of feature functions, are present in the processed recognized text. The boundary identifier may further include a decision maker for determining if a command is present in the processed recognized text according to a set of feature weights corresponding to the feature functions in the processed recognized text. The decision maker may be coupled to the feature detector and may decide if the processed recognized text includes a command boundary. [0007]
  • In still other embodiments, a training system for training the apparatus to recognize text and to recognize complete commands may be included. The training system may include an input processor for processing a collection of training data comprising utterances which include complete commands and other than complete commands. The input processor may insert a token before each utterance in the training data. The input processor may insert a token before a first utterance in the recognized text, and after every command in the recognized text. A feature extractor may be included for extracting feature functions including words and relative positions of the words with respect to a hypothesized command boundary location. The speech recognizer may include a language model that has been trained using training data, the training data including a token inserted to indicate a location of a command boundary in the training data. The speech recognizer may include additional baseforms for the token. The speech recognizer may produce the recognized text including the token. The boundary identifier may declare a command boundary when there is an extended period of silence in the recognized text. [0008]
  • A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for identifying commands in recognized text, the method steps include inputting recognized text, processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary, determining feature functions in the processed recognized text in accordance with a set of feature functions, deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions and if a command is included, outputting the command. [0009]
  • In alternate embodiments, a program of instructions for training the program storage device by inputting training data including utterances comprising commands and other than commands may be included. The steps of placing a token before each utterance may be included. The step of placing a token after each command boundary included in the utterances may also be included. The program of instructions for training the program storage device may includes the step of extracting feature functions from the training data. The program of instructions for training the program storage device may include the step of determining feature weights for all feature functions. The program of instructions for processing the recognized text may include the step of placing a token before a first utterance in the recognized text and after each command in the recognized text. The program storage device may further include a speech recognizer for providing the recognized text. [0010]
  • A method for identifying commands in natural conversational language includes the steps of inputting recognized text, processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary, determining feature functions in the processed recognized text in accordance with a set of feature functions, deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions and if a command is included, outputting the command. [0011]
  • In other methods, the step of inputting training data including utterances comprising commands and other than commands may be included. The steps of placing a token before each utterance of the training data may also be included. The method may further include the step of placing a token after command boundaries included in the utterances. The method may include the step of extracting feature functions from the training data. The method may further includes the step of determining feature weights for all feature functions. The step of placing a token before a first utterance in the recognized text and after each command in the recognized text may be included. The step of outputting the command to a device for executing the command includes a speech recognizer for providing the recognized text may also be included. [0012]
  • These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings. [0013]
  • BRIEF DESCRIPTION OF DRAWINGS
  • The invention will be described in detail in the following description of preferred embodiments with reference to the following figures wherein: [0014]
  • FIG. 1 is a block/flow diagram of a system/method of which includes a boundary identifier, according to the present invention; [0015]
  • FIG. 2 is a block/flow diagram of an application that uses complete commands generated by a boundary identifier, according to the present invention; [0016]
  • FIG. 3 is a block/flow diagram of a boundary identifier, according to the present invention; [0017]
  • FIG. 4 is a block/flow diagram of an apparatus that generates feature functions and feature weights used by the boundary identifier, according to the present invention; and [0018]
  • FIG. 5 is a block diagram of a speech recognizer that generates recognized text to be used by the boundary identifier, according to the present invention.[0019]
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • The present invention relates to speech recognition and, more particularly, to an apparatus and method for identifying command boundaries from natural conversational speech. The present invention includes a trainable system which automatically identify commands words or phrases from conversational natural language. The invention provides a more user friendly interface which permits a user to speak more naturally and continually without manual indication of command boundaries. A maximum entropy identification model is preferably used that has all correct command boundaries marked. During training, a set of features and their weights are iteratively selected using the training data. The features include words and phrases, as well as their relative positions to potential command boundaries of the speech. Alternate embodiments of the present invention include a more effective language model to used to generate additional useful tokens for the identification model. [0020]
  • The present invention provides an apparatus that can automatically identify the command boundaries in a conversational natural language user interface. Advantageously, the present invention is trainable with additional data to improve performance, or with data from a new domain to allow the use of the apparatus in a new domain. The present invention may also identify and separate multiple commands included in a single utterance. The present invention uses minimal computational resources during identification to allow its use in a real-time system. [0021]
  • The present invention uses statistical techniques both from natural language understanding and from speech recognition. [0022]
  • Preferably by using a maximum entropy identification model, training data is first marked with the command boundaries. For each command boundary, all the surrounding words within a window (including words which are both to the left and to the right of the boundary) are marked to indicate their relative position with respect to the boundary. The training data which is thus processed is then subjected to maximum entropy style feature extraction, with the features including words and phrases, as well as their relative position to the boundary. The corresponding weights for the features are estimated using an iterative algorithm. During decoding, the test sentences are processed similarly to mark the relative position of each of the words in the current string, with respect to a hypothesized location of the command boundary. When possible, words occurring after the hypothesized location of the boundary are also marked. Then, the decision of whether or not to declare the hypothesized location as a command boundary is made by examining the product of the weights for the features that are present. [0023]
  • The present invention also includes ways to strengthen the maximum entropy identification model. One such enhancement includes using a more effective language model at the speech recognition stage. All the command boundaries in the language model training data are advantageously marked with a token, and an additional set of baseforms for the boundary (most of the baseforms corresponding to various forms of silence) are included in the model. With this addition, the speech recognition engine produces a string of text with additional tokens to suggest potential command boundaries. Other enhancements to the identification model, such as taking advantage of extended periods of silence, are also described. [0024]
  • Besides identifying the command boundary, the present invention may be used to recognize multiple commands in the same sentence. This alleviates the need to construct and support compound commands, since sentences including multiple commands may be automatically decomposed using the same command boundary identification process. It should be understood that the elements shown in FIGS. [0025] 1-5 may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in software on one or more appropriately programmed general purpose digital computers having a processor and memory and input/output interfaces. Referring now to the drawings in which like numerals represent the same or similar elements and initially to FIG. 1, a flow/block diagram is shown of an example of a system 8 that includes a boundary identifier, according to the present invention. An audio input 10 is generated by a user of system 8, and is in the form of a spoken command issued to system 8. For example, if the system is an electronic mail application, then an example of a command issued by the user may be “check new mail” or “show me the next message”. Audio input 10 is converted to recognized text 30, by a speech recognizer 20. The construction of speech recognizer 20 is known to those skilled in the art. Recognized text 30 is an input to a boundary identifier 40, which generates a complete command 50 as output. If recognized text 30 is a complete command, recognized text 30 is sent as the output. If recognized text 30 is not a complete command, then no output is sent. For the electronic mail application, examples of recognized text that may be a complete command are “check new mail” and “show me the next message”, and examples of recognized text that are not a complete command are “check new”, “show me the” and “check new mail show”. Complete command 50 is used by an application 60. Application 60 is preferably a software application and complete commands 50 may be used open the software application and otherwise interface therewith. The present invention finds utility in many applications, for example, system 8 may interface with mechanical equipment or electronic devices. System 8 can transfer verbal commands or audio signals into executable signals to, for example, turn on/off an appliance or adjust features or functions of the equipment/devices.
  • Referring to FIG. 2, a block/flow diagram of an example of an [0026] application 60 that uses a complete command is shown. Application 60 preferably includes a natural language understanding system 61 and a command executor 62. The information included in complete command 50 is analyzed and interpreted by natural language understanding system 61 to generate a formal command, and the formal command is executed by command executor 61. For example, if the complete command is “do I have any new messages”, then the natural language understanding system may translate this into a command such as CheckNewMessage (), and submit the command to command executor 62.
  • Referring to FIG. 3, a block/flow diagram of an example of a [0027] boundary identifier 40 is shown. Boundary identifier 40 takes recognized text 30 as an input and produces complete command 50 as the output. Boundary identifier 40 includes feature functions 41, feature weights 42, a feature detector 43, an input processor 44 and a decision maker 45. Boundary identifier 40 will now be described by way of example. Other boundary symbols may be used within the scope of the invention. Given one or more words that comprises recognized text 30, which may be denoted as S, boundary identifier 40 decides if S is a complete command. The decision, denoted by T is set to T=1 if S is a complete command, and it is set to T=0 otherwise. If S is a complete command, recognized text 30 is sent as the output of boundary identifier 40, which is complete command 50. Boundary identifier 40, therefore, is responsible for evaluating the conditional probability P(T|S) for both of the values T and selecting as the decision that T which maximizes P(T|S).
  • [0028] Boundary identifier 40 needs a model built from training data that can generate the values P(T|S). The present invention preferably generates the values P(T|S) by using a maximum entropy principle as described by A. Berger et al., “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics, Vol. 22, No. 1, pp. 39-71, March 1996, incorporated herein by reference. Other components, such as feature detector 43, input processor 44 and decision maker 45, as well as feature functions 41 and feature weights 42 will be described in greater detail below.
  • Referring to FIG. 4, a block/flow diagram of an example of maximum entropy model construction is shown. [0029] Training data 70 includes a large number of training utterances relevant to the domain, corresponding to complete commands. From these utterances, a set of utterances are generated that do not correspond to complete commands, and these utterances are also added to training data 70. For every entry in this augmented set of training data, the correct decision (T=0 or T=1) may also be determined. For the electronic mail example discussed earlier, where the utterance “check new mail” was followed by “show me the first message”, the following entries may be made in the training data.
  • check // T=0 [0030]
  • check new // T=0 [0031]
  • check new mail // T=1 [0032]
  • check new mail show // T=0 [0033]
  • check new mail show me // T=0 [0034]
  • In the last two entries, words from a subsequent utterance have been added, namely “show” and “show me”. Such entries are sometimes desirable to resolve certain ambiguities that may arise. For example, utterances such as “delete”, “delete this” and “delete this one” are all complete commands. In these cases, although “delete” by itself may be a complete command, it is not so when followed by “this”, and similarly, “delete this” is not a complete command when followed by “one”. Hence, this “look ahead” step is necessary, and the number of words to look ahead, also known as a window size of the look ahead step, is one of the parameters of the present invention. In one embodiment of the invention, a window size of two words is provided, although other window sizes may be included depending on the application. [0035]
  • [0036] Input processor 44 processes training data 70. For every possible position of the command boundary, also known as the hypothesized location of the command boundary, input processor 44 augments each word in the training set with −n if the word is n positions to the left of the hypothesized command boundary, and with +n if the word is n positions to the right of the hypothesized command boundary. After the processing by input processor 44, the entries in the processed training set will look like:
  • check−1 // T=0 [0037]
  • check−2 new−1 // T=0 [0038]
  • check−3 new−2 mail−1 // T=1 [0039]
  • check−4 new−3 mail−2 show−1 // T=0 [0040]
  • check−5 new−4 mail−3 show−2 me−1 // T=0 [0041]
  • check−3 new−2 mail−1 show+1 // T=1 [0042]
  • check−3 new−2 mail−1 show+1 me+[0043] 2 // T=1
  • check−4 new−3 mail−2 show−1 me+1 // T=0 [0044]
  • In the above example, the additional entries have been added to accommodate the look ahead process described earlier. [0045]
  • Turning again to FIG. 4, [0046] input processor 44 processes the training data 70, and the processed training data is used by feature extractor 46 to produce feature functions 41. In one embodiment of this invention, feature functions may be of the form: f t , s i ( T , S ) = ( 1 if t = T , s S 0 otherwise ( EQ . 1 )
    Figure US20010056344A1-20011227-M00001
  • are used, where i is the index of the feature, with i=1, . . , n, and the total number if features is n. The feature functions include one or more words from the processed training data, along with the correct decision. For example, consider the feature [0047]
  • f (new−2 mail−1), (T=1)
  • This feature is used if the utterance S includes the word “new” and “mail” at the first and seconds positions, respectively, to the left of the hypothesized command boundary, for the case where T=1. The total number of features, n, is a parameter of the invention, and its value depends on the application. Each [0048] feature function 41 includes one or more words with the relative positions augmented, along with the corresponding decision (T=0 or T=1). The selection of feature functions 41 from training data 70 processed by the input processor is known in the art, and may be done as described in Papineni et al., “Feature-Based Language Understanding,” EUROSPEECH, Rhodes, Greece, 1997, incorporated herein by reference.
  • Turning again to FIG. 4, after the [0049] feature extractor 46 produces feature functions 41, a feature weight calculator 47 calculates features weights 42, including weight αi for feature function fi t,s for all n feature functions. In one embodiment of the invention, to calculate feature weights 42, an Improved Iterative Scaling algorithm described in S. Della Pietra et al.,“Inducing Features of Random Fields,” Technical Report CMU-CS95-144, School of Computer Science, Carnegie Mellon University, 1995, is used and incorporated herein by reference. The maximum entropy model, as derived in A. Ratnaparkhi, “A Simple Introduction to Maximum Entropy Models for Natural Language Processing,” Institute for Research in Cognitive Science, Report 97-08, University of Pennsylvania, May 1997, incorporated herein by reference, for the joint distribution P(T|S) is given by P ( T , S ) = μ i = 1 n α i f t , s i ( T , S ) ( EQ . 2 )
    Figure US20010056344A1-20011227-M00002
  • where μ is normalization constant. [0050]
  • Returning to FIG. 3, for every utterance in recognized [0051] text 30, input processor 44 augments the relative position of each word in the utterance relative to a given hypothesized command boundary location, and repeats this augmentation for all possible command boundary locations. Feature detector 43 determines which feature functions 41 are present in a given processed utterance, and decision maker 45 makes the final decision as to whether or not the given processed utterance is a complete command. The decision maker first calculates P(T=1|S) given by P ( T = 1 S ) = P ( T = 1 , S ) P ( T = 1 , S ) + P ( T = 0 , S ) ( EQ . 3 )
    Figure US20010056344A1-20011227-M00003
  • and the utterance S is declared as a complete command if and only if [0052]
  • P(T=1|S)>P(T=0|S)   (EQ. 4)
  • One embodiment of the present invention may be used to improve the performance of the invention. This embodiment includes using a new token to indicate the beginning of the utterance. Using a token of, for example, “SB” to indicate the beginning of the utterance, the entries in the processed training set may look like the following: [0053]
  • SB−4 check−3 new−2 mail−1 // T=1 [0054]
  • SB−5 check−4 new−3 mail−2 show−1 // T=0 [0055]
  • SB−4 check−3 new−2 mail−1 show+1 me+2 // T=1 [0056]
  • SB−5 check−4 new−3 mail−2 show−1 me+1 // T=0 [0057]
  • This adds an additional step to the processing of [0058] input processor 44 in FIG. 3 and FIG. 4. In FIG. 4, every utterance in training data 70 may include the SB token at the beginning of each utterance. In FIG. 3, the SB token may be inserted before the first utterance, and for subsequent utterances, it is preferably inserted after every declared command boundary. Other tokens and placements thereof are contemplated as well by the present invention.
  • Referring to FIG. 5, a block/flow diagram of a example of another embodiment of the present invention is shown. [0059] Speech recognizer 20 includes a language model 21 and other components 22. In one embodiment of the present invention, language model 21 is enhanced so that speech recognizer 20 can produce recognized text 30 that also includes a new token, for example, SE that suggests a possible location for the command boundary. With this enhancement, speech recognizer 20 produces recognized text 30 that includes utterances, for example, “check new mail SE show me the first message SE . . . ”. To accomplish this, language model 21 is preferably built using data that has the SE tokens inserted at the end of each complete command. The language model may be built using procedures described in the F. Jelinek, incorporated by reference above. To support the new SE tokens, acoustic baseforms for this token are added to speech recognizer 20. Acoustic baseforms for this token, corresponding to various forms of silences, are added to the model. In one embodiment, the following acoustic baseforms are used for the SE token:
  • D$ [0060]
  • X [0061]
  • XX [0062]
  • XXX [0063]
  • X AA X [0064]
  • X AO M X [0065]
  • X AO X [0066]
  • X AX X [0067]
  • X F X [0068]
  • X HH X [0069]
  • X K X [0070]
  • X P X [0071]
  • X TD X [0072]
  • Referring again to FIG. 4, [0073] training data 70 is first subjected to speech recognizer 20 to produce the SE tokens, and input processor 44 generates processed data that may look like
  • SB−5 check−4 new−3 mail−2 SE−1 // T=1 [0074]
  • SB−6 check−5 new−4 mail−3 SE−2 show−1 // T=0 [0075]
  • SB−5 check−4 new−3 mail−2 SE−1 show+1 me+2 // T=1 [0076]
  • SB−6 check−5 new−4 mail−3 show−2 SE−1 me+1 // T=0 [0077]
  • Another embodiment of the present invention uses any extended period of silence present in the utterances. With this embodiment, [0078] decision maker 45 in FIG. 3 may declare a command boundary if the condition specified by EQ. 4 is satisfied, or if there is an extended period of silence between utterances. In one embodiment of the invention, if there is silence for 3 seconds or more, for example, decision maker 45 declares a command boundary. In another embodiment of the invention, the user may choose the desirable length of the silence by means of options provided by the interface to system 8.
  • The invention described herein for identifying the command boundaries may also be used to recognize the presence of multiple commands in the same utterance. A command boundary may be placed after each portion of the utterance corresponding to a complete command, thus decomposing the input utterance into multiple commands. For example, if the sentence “check for new mail show me the first one” is input the output could be: [0079]
  • SB−5 check−4 new−3 mail−2 SE−1 // T=1 [0080]
  • SB−7 show−6 me−5 the−4 first−3 one−2 SE−1 // T=1. [0081]
  • Having described preferred embodiments of a command boundary identifier for conversational natural language (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments of the invention disclosed which are within the scope and spirit of the invention as outlined by the appended claims. Having thus described the invention with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims. [0082]

Claims (32)

What is claimed is:
1. An apparatus for automatically identifying command boundaries in a conversational natural language system, comprising:
a speech recognizer for converting an input signal to recognized text; and
a boundary identifier coupled to the speech recognizer for receiving the recognized text and determining if a command is present in the recognized text, the boundary identifier outputting the command if present in the recognized text.
2. The apparatus as recited in
claim 1
, wherein the boundary identifier outputs to an application which executes the command.
3. The apparatus as recited in
claim 1
, wherein the boundary identifier includes an input processor for processing the recognized text.
4. The apparatus as recited in
claim 3
, wherein the input processor processes the recognized text by augmenting each word in the recognized text by the word's relative position with respect to a hypothesized command boundary.
5. The apparatus as recited in
claim 3
, wherein the boundary identifier further comprises a feature detector coupled to the input processor, the feature detector for determining which feature functions are present, from a set of feature functions, in the processed recognized text.
6. The apparatus as recited in
claim 5
, wherein the boundary identifier further comprises a decision maker for determining if a command is present in the processed recognized text according to a set of feature weights corresponding to the feature functions in the processed recognized text.
7. The apparatus defined in
claim 6
, wherein the decision maker is coupled to the feature detector and decides if the processed recognized text includes a command boundary.
8. The apparatus as recited in
claim 1
, further comprising a training system for training the apparatus to recognize text and to recognize complete commands.
9. The apparatus as recited in
claim 8
, wherein the training system includes an input processor for processing a collection of training data comprising utterances which include complete commands and other than complete commands.
10. The apparatus as recited in
claim 9
, wherein the input processor inserts a token before each utterance in the training data.
11. The apparatus as recited in
claim 9
, wherein the input processor inserts a token before a first utterance in the recognized text, and after every command in the recognized text.
12. The apparatus as recited in
claim 8
, further comprises a feature extractor for extracting feature functions including words and relative positions of the words with respect to a hypothesized boundary location.
13. The apparatus as recited in
claim 1
, wherein the speech recognizer includes a language model that has been trained using training data, the training data including a token inserted to indicate a location of a command boundary in the training data.
14. The apparatus as recited in
claim 13
, wherein the speech recognizer includes additional baseforms for the token.
15. The apparatus as recited in
claim 14
, wherein the speech recognizer produces the recognized text including the token.
16. The apparatus as recited in
claim 1
, wherein the boundary identifier declares a command boundary when there is an extended period of silence in the recognized text.
17. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for identifying commands in recognized text, the method steps comprising:
inputting recognized text;
processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary;
determining feature functions in the processed recognized text in accordance with a set of feature functions;
deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions; and
if a command is included, outputting the command.
18. The program storage device as recited in
claim 17
, further comprises a program of instructions for training the program storage device by inputting training data including utterances comprising commands and other than commands.
19. The program storage device as recited in
claim 18
, wherein the program of instructions for training the program storage device includes the step of placing a token before each utterance.
20. The program storage device as recited in
claim 18
, wherein the program of instructions for training the program storage device includes the step of placing a token after each command boundary included in the utterances.
21. The program storage device as recited in
claim 17
, wherein the program of instructions for training the program storage device includes the step of extracting feature functions from the training data.
22. The program storage device as recited in
claim 17
, wherein the program of instructions for training the program storage device includes the step of determining feature weights for all feature functions.
23. The program storage device as recited in
claim 17
, wherein the program of instructions for processing the recognized text includes the step of placing a token before a first utterance in the recognized text and after each command in the recognized text.
24. The program storage device as recited in
claim 17
, further includes a speech recognizer for providing the recognized text.
25. A method for identifying command in natural conversational language comprising the steps of:
inputting recognized text;
processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary;
determining feature functions in the processed recognized text in accordance with a set of feature functions;
deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions; and
if a command is included, outputting the command.
26. The method as recited in
claim 25
, further comprises the step of inputting training data including utterances comprising commands and other than commands to an input processor.
27. The method as recited in
claim 25
, further comprises the step of placing a token before each utterance of the training data.
28. The method as recited in
claim 26
, further comprises the step of placing a token after command boundaries included in the utterances.
29. The method as recited in
claim 26
, further comprises the step of extracting feature functions from the training data.
30. The method as recited in
claim 26
, further comprises the step of determining feature weights for all feature functions.
31. The method as recited in
claim 26
, further comprises the step of placing a token before a first utterance in the recognized text and after each command in the recognized text.
32. The method as recited in
claim 25
, further comprises the step of outputting the command to a device for executing the command includes a speech recognizer for providing the recognized text.
US09/181,322 1998-10-28 1998-10-28 Command boundary identifier for conversational natural language Expired - Lifetime US6453292B2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US09/181,322 US6453292B2 (en) 1998-10-28 1998-10-28 Command boundary identifier for conversational natural language
JP23021799A JP3476006B2 (en) 1998-10-28 1999-08-17 Command boundary identification device, method and program storage device
GB9921422A GB2343285B (en) 1998-10-28 1999-09-13 Speech recognition system
CN99121518.4A CN1125436C (en) 1998-10-28 1999-10-14 Command boundary discriminator of conversation natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/181,322 US6453292B2 (en) 1998-10-28 1998-10-28 Command boundary identifier for conversational natural language

Publications (2)

Publication Number Publication Date
US20010056344A1 true US20010056344A1 (en) 2001-12-27
US6453292B2 US6453292B2 (en) 2002-09-17

Family

ID=22663797

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/181,322 Expired - Lifetime US6453292B2 (en) 1998-10-28 1998-10-28 Command boundary identifier for conversational natural language

Country Status (4)

Country Link
US (1) US6453292B2 (en)
JP (1) JP3476006B2 (en)
CN (1) CN1125436C (en)
GB (1) GB2343285B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076761A1 (en) * 2008-09-25 2010-03-25 Fritsch Juergen Decoding-Time Prediction of Non-Verbalized Tokens
US20120116765A1 (en) * 2009-07-17 2012-05-10 Nec Corporation Speech processing device, method, and storage medium
US20130035938A1 (en) * 2011-08-01 2013-02-07 Electronics And Communications Research Institute Apparatus and method for recognizing voice
US20140095146A1 (en) * 2012-09-28 2014-04-03 International Business Machines Corporation Documentation of system monitoring and analysis procedures
US20150269930A1 (en) * 2014-03-18 2015-09-24 Industrial Technology Research Institute Spoken word generation method and system for speech recognition and computer readable medium thereof
EP2937860A1 (en) * 2014-04-23 2015-10-28 Google, Inc. Speech endpointing based on word comparisons
US20160007130A1 (en) * 2014-07-07 2016-01-07 Adobe Systems Incorporated Performance Metric Based Stopping Criteria for Iterative Algorithms
US9842589B2 (en) 2012-02-27 2017-12-12 Nec Corporation Voice input device, voice input method and program
US10140982B2 (en) * 2012-08-03 2018-11-27 Veveo, Inc. Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US10269341B2 (en) 2015-10-19 2019-04-23 Google Llc Speech endpointing
CN109949803A (en) * 2019-02-11 2019-06-28 特斯联(北京)科技有限公司 Building service facility control method and system based on semantic instructions intelligent recognition
US10403275B1 (en) * 2016-07-28 2019-09-03 Josh.ai LLC Speech control for complex commands
CN110797019A (en) * 2014-05-30 2020-02-14 苹果公司 Multi-command single-speech input method
US10593352B2 (en) 2017-06-06 2020-03-17 Google Llc End of query detection
US10929754B2 (en) 2017-06-06 2021-02-23 Google Llc Unified endpointer using multitask and multidomain learning
US10997964B2 (en) * 2014-11-05 2021-05-04 At&T Intellectual Property 1, L.P. System and method for text normalization using atomic tokens
US11062696B2 (en) 2015-10-19 2021-07-13 Google Llc Speech endpointing
US11488603B2 (en) * 2019-06-06 2022-11-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for processing speech

Families Citing this family (222)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6954923B1 (en) 1999-01-28 2005-10-11 Ati International Srl Recording classification of instructions executed by a computer
US7941647B2 (en) 1999-01-28 2011-05-10 Ati Technologies Ulc Computer for executing two instruction sets and adds a macroinstruction end marker for performing iterations after loop termination
US8121828B2 (en) 1999-01-28 2012-02-21 Ati Technologies Ulc Detecting conditions for transfer of execution from one computer instruction stream to another and executing transfer on satisfaction of the conditions
US6978462B1 (en) * 1999-01-28 2005-12-20 Ati International Srl Profiling execution of a sequence of events occuring during a profiled execution interval that matches time-independent selection criteria of events to be profiled
US8127121B2 (en) 1999-01-28 2012-02-28 Ati Technologies Ulc Apparatus for executing programs for a first computer architechture on a computer of a second architechture
US8074055B1 (en) 1999-01-28 2011-12-06 Ati Technologies Ulc Altering data storage conventions of a processor when execution flows from first architecture code to second architecture code
WO2001013255A2 (en) 1999-08-13 2001-02-22 Pixo, Inc. Displaying and traversing links in character array
WO2001022228A1 (en) 1999-09-17 2001-03-29 Nortel Networks Limited System and method for producing a verification system for verifying procedure interfaces
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US20020072914A1 (en) 2000-12-08 2002-06-13 Hiyan Alshawi Method and apparatus for creation and user-customization of speech-enabled services
ITFI20010199A1 (en) 2001-10-22 2003-04-22 Riccardo Vieri SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM
US7398209B2 (en) 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7693720B2 (en) 2002-07-15 2010-04-06 Voicebox Technologies, Inc. Mobile systems and methods for responding to natural language speech utterance
US7669134B1 (en) 2003-05-02 2010-02-23 Apple Inc. Method and apparatus for displaying information during an instant messaging session
US6925928B2 (en) * 2003-09-18 2005-08-09 Anthony Fox Trash compactor for fast food restaurant waste
US7680659B2 (en) * 2005-06-01 2010-03-16 Microsoft Corporation Discriminative training for language modeling
US7640160B2 (en) 2005-08-05 2009-12-29 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7620549B2 (en) 2005-08-10 2009-11-17 Voicebox Technologies, Inc. System and method of supporting adaptive misrecognition in conversational speech
US7949529B2 (en) 2005-08-29 2011-05-24 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US7634409B2 (en) 2005-08-31 2009-12-15 Voicebox Technologies, Inc. Dynamic speech sharpening
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US7633076B2 (en) 2005-09-30 2009-12-15 Apple Inc. Automated response to and sensing of user activity in portable devices
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US7805305B2 (en) * 2006-10-12 2010-09-28 Nuance Communications, Inc. Enhancement to Viterbi speech processing algorithm for hybrid speech models that conserves memory
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US8595642B1 (en) 2007-10-04 2013-11-26 Great Northern Research, LLC Multiple shell multi faceted graphical user interface
US8165886B1 (en) 2007-10-04 2012-04-24 Great Northern Research LLC Speech interface system and method for control and interaction with applications on a computing system
US8364694B2 (en) 2007-10-26 2013-01-29 Apple Inc. Search assistant for digital media assets
CN101424973A (en) * 2007-11-02 2009-05-06 夏普株式会社 Input device
US8620662B2 (en) 2007-11-20 2013-12-31 Apple Inc. Context-aware unit selection
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8327272B2 (en) 2008-01-06 2012-12-04 Apple Inc. Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars
US9177551B2 (en) * 2008-01-22 2015-11-03 At&T Intellectual Property I, L.P. System and method of providing speech processing in user interface
US8065143B2 (en) 2008-02-22 2011-11-22 Apple Inc. Providing text input using speech data and non-speech data
US8289283B2 (en) 2008-03-04 2012-10-16 Apple Inc. Language input interface on a device
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8589161B2 (en) 2008-05-27 2013-11-19 Voicebox Technologies, Inc. System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8464150B2 (en) 2008-06-07 2013-06-11 Apple Inc. Automatic language identification for dynamic text processing
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US8768702B2 (en) 2008-09-05 2014-07-01 Apple Inc. Multi-tiered voice feedback in an electronic device
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US8352268B2 (en) 2008-09-29 2013-01-08 Apple Inc. Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US8355919B2 (en) 2008-09-29 2013-01-15 Apple Inc. Systems and methods for text normalization for text to speech synthesis
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8352272B2 (en) 2008-09-29 2013-01-08 Apple Inc. Systems and methods for text to speech synthesis
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US8396714B2 (en) 2008-09-29 2013-03-12 Apple Inc. Systems and methods for concatenation of words in text to speech synthesis
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
WO2010067118A1 (en) 2008-12-11 2010-06-17 Novauris Technologies Limited Speech recognition involving a mobile device
US8494857B2 (en) 2009-01-06 2013-07-23 Regents Of The University Of Minnesota Automatic measurement of speech fluency
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US20120311585A1 (en) 2011-06-03 2012-12-06 Apple Inc. Organizing task items that represent tasks to perform
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
WO2011059997A1 (en) 2009-11-10 2011-05-19 Voicebox Technologies, Inc. System and method for providing a natural language content dedication service
US9171541B2 (en) 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
US8682649B2 (en) 2009-11-12 2014-03-25 Apple Inc. Sentiment prediction from textual data
US8600743B2 (en) 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US8381107B2 (en) 2010-01-13 2013-02-19 Apple Inc. Adaptive audio feedback system and method
US8311838B2 (en) 2010-01-13 2012-11-13 Apple Inc. Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8626511B2 (en) * 2010-01-22 2014-01-07 Google Inc. Multi-dimensional disambiguation of voice commands
WO2011089450A2 (en) 2010-01-25 2011-07-28 Andrew Peter Nelson Jerram Apparatuses, methods and systems for a digital conversation management platform
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US8639516B2 (en) 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
US8713021B2 (en) 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US9104670B2 (en) 2010-07-21 2015-08-11 Apple Inc. Customized search or acquisition of digital media assets
US8719006B2 (en) 2010-08-27 2014-05-06 Apple Inc. Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
US8719014B2 (en) 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US8781836B2 (en) 2011-02-22 2014-07-15 Apple Inc. Hearing assistance system for providing consistent human speech
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US20120310642A1 (en) 2011-06-03 2012-12-06 Apple Inc. Automatically creating a mapping between text data and audio data
US8812294B2 (en) 2011-06-21 2014-08-19 Apple Inc. Translating phrases from one language into another using an order-based set of declarative rules
US8706472B2 (en) 2011-08-11 2014-04-22 Apple Inc. Method for disambiguating multiple readings in language conversion
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US8762156B2 (en) 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9576593B2 (en) 2012-03-15 2017-02-21 Regents Of The University Of Minnesota Automated verbal fluency assessment
US9317605B1 (en) 2012-03-21 2016-04-19 Google Inc. Presenting forked auto-completions
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US8775442B2 (en) 2012-05-15 2014-07-08 Apple Inc. Semantic search using a single-source semantic model
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
WO2013185109A2 (en) 2012-06-08 2013-12-12 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
CN102855720A (en) * 2012-09-11 2013-01-02 深圳市豪恩安全科技有限公司 Photoelectric beam detector capable of automatically switching
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
KR102516577B1 (en) 2013-02-07 2023-04-03 애플 인크. Voice trigger for a digital assistant
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
WO2014144949A2 (en) 2013-03-15 2014-09-18 Apple Inc. Training an at least partial voice command system
US11151899B2 (en) 2013-03-15 2021-10-19 Apple Inc. User training by intelligent digital assistant
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
CN112230878A (en) 2013-03-15 2021-01-15 苹果公司 Context-sensitive handling of interrupts
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
EP3008641A1 (en) 2013-06-09 2016-04-20 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
CN105265005B (en) 2013-06-13 2019-09-17 苹果公司 System and method for the urgent call initiated by voice command
US9646606B2 (en) 2013-07-03 2017-05-09 Google Inc. Speech recognition using domain knowledge
CN103345922B (en) * 2013-07-05 2016-07-06 张巍 A kind of large-length voice full-automatic segmentation method
WO2015020942A1 (en) 2013-08-06 2015-02-12 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
EP3195145A4 (en) 2014-09-16 2018-01-24 VoiceBox Technologies Corporation Voice commerce
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
CN107003999B (en) 2014-10-15 2020-08-21 声钰科技 System and method for subsequent response to a user's prior natural language input
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US10453074B2 (en) 2016-07-08 2019-10-22 Asapp, Inc. Automatically suggesting resources for responding to a request
US10083451B2 (en) 2016-07-08 2018-09-25 Asapp, Inc. Using semantic processing for customer support
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
JP2018048965A (en) * 2016-09-23 2018-03-29 株式会社鷺宮製作所 Pressure sensor
US10109275B2 (en) * 2016-12-19 2018-10-23 Asapp, Inc. Word hash language model
US10650311B2 (en) 2016-12-19 2020-05-12 Asaap, Inc. Suggesting resources using context hashing
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
CN107146602B (en) * 2017-04-10 2020-10-02 北京猎户星空科技有限公司 Voice recognition method and device and electronic equipment
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US10497004B2 (en) 2017-12-08 2019-12-03 Asapp, Inc. Automating communications using an intent classifier
US10489792B2 (en) 2018-01-05 2019-11-26 Asapp, Inc. Maintaining quality of customer support messages
US10210244B1 (en) 2018-02-12 2019-02-19 Asapp, Inc. Updating natural language interfaces by processing usage data
US10586538B2 (en) 2018-04-25 2020-03-10 Comcast Cable Comminications, LLC Microphone array beamforming control
US10169315B1 (en) 2018-04-27 2019-01-01 Asapp, Inc. Removing personal information from text using a neural network
US11216510B2 (en) 2018-08-03 2022-01-04 Asapp, Inc. Processing an incomplete message with a neural network to generate suggested messages
US10747957B2 (en) 2018-11-13 2020-08-18 Asapp, Inc. Processing communications using a prototype classifier
US11551004B2 (en) 2018-11-13 2023-01-10 Asapp, Inc. Intent discovery with a prototype classifier
US11425064B2 (en) 2019-10-25 2022-08-23 Asapp, Inc. Customized message suggestion with user embedding vectors

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03203794A (en) 1989-12-29 1991-09-05 Pioneer Electron Corp Voice remote controller
JP2764343B2 (en) * 1990-09-07 1998-06-11 富士通株式会社 Clause / phrase boundary extraction method
JP2924555B2 (en) * 1992-10-02 1999-07-26 三菱電機株式会社 Speech recognition boundary estimation method and speech recognition device
KR950704772A (en) * 1993-10-15 1995-11-20 데이비드 엠. 로젠블랫 A method for training a system, the resulting apparatus, and method of use
US5594834A (en) * 1994-09-30 1997-01-14 Motorola, Inc. Method and system for recognizing a boundary between sounds in continuous speech
US5729656A (en) * 1994-11-30 1998-03-17 International Business Machines Corporation Reduction of search space in speech recognition using phone boundaries and phone ranking
US5638487A (en) * 1994-12-30 1997-06-10 Purespeech, Inc. Automatic speech recognition
US5794196A (en) * 1995-06-30 1998-08-11 Kurzweil Applied Intelligence, Inc. Speech recognition system distinguishing dictation from commands by arbitration between continuous speech and isolated word modules
US5794189A (en) * 1995-11-13 1998-08-11 Dragon Systems, Inc. Continuous speech recognition

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8918317B2 (en) * 2008-09-25 2014-12-23 Multimodal Technologies, Llc Decoding-time prediction of non-verbalized tokens
US20100076761A1 (en) * 2008-09-25 2010-03-25 Fritsch Juergen Decoding-Time Prediction of Non-Verbalized Tokens
US20120116765A1 (en) * 2009-07-17 2012-05-10 Nec Corporation Speech processing device, method, and storage medium
US9583095B2 (en) * 2009-07-17 2017-02-28 Nec Corporation Speech processing device, method, and storage medium
US20130035938A1 (en) * 2011-08-01 2013-02-07 Electronics And Communications Research Institute Apparatus and method for recognizing voice
US9842589B2 (en) 2012-02-27 2017-12-12 Nec Corporation Voice input device, voice input method and program
US10140982B2 (en) * 2012-08-03 2018-11-27 Veveo, Inc. Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US20140095146A1 (en) * 2012-09-28 2014-04-03 International Business Machines Corporation Documentation of system monitoring and analysis procedures
US9189465B2 (en) * 2012-09-28 2015-11-17 International Business Machines Corporation Documentation of system monitoring and analysis procedures
US9691389B2 (en) * 2014-03-18 2017-06-27 Industrial Technology Research Institute Spoken word generation method and system for speech recognition and computer readable medium thereof
US20150269930A1 (en) * 2014-03-18 2015-09-24 Industrial Technology Research Institute Spoken word generation method and system for speech recognition and computer readable medium thereof
EP2937860A1 (en) * 2014-04-23 2015-10-28 Google, Inc. Speech endpointing based on word comparisons
US20200043466A1 (en) * 2014-04-23 2020-02-06 Google Llc Speech endpointing based on word comparisons
EP3767620A3 (en) * 2014-04-23 2021-04-07 Google LLC Speech endpointing based on word comparisons
US10140975B2 (en) 2014-04-23 2018-11-27 Google Llc Speech endpointing based on word comparisons
US11004441B2 (en) 2014-04-23 2021-05-11 Google Llc Speech endpointing based on word comparisons
US20190043480A1 (en) * 2014-04-23 2019-02-07 Google Llc Speech endpointing based on word comparisons
US9607613B2 (en) 2014-04-23 2017-03-28 Google Inc. Speech endpointing based on word comparisons
US11636846B2 (en) 2014-04-23 2023-04-25 Google Llc Speech endpointing based on word comparisons
US10546576B2 (en) 2014-04-23 2020-01-28 Google Llc Speech endpointing based on word comparisons
US20210151041A1 (en) * 2014-05-30 2021-05-20 Apple Inc. Multi-command single utterance input method
US11670289B2 (en) * 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
CN110797019A (en) * 2014-05-30 2020-02-14 苹果公司 Multi-command single-speech input method
US20160007130A1 (en) * 2014-07-07 2016-01-07 Adobe Systems Incorporated Performance Metric Based Stopping Criteria for Iterative Algorithms
US9866954B2 (en) * 2014-07-07 2018-01-09 Adobe Systems Incorporated Performance metric based stopping criteria for iterative algorithms
US10997964B2 (en) * 2014-11-05 2021-05-04 At&T Intellectual Property 1, L.P. System and method for text normalization using atomic tokens
US11062696B2 (en) 2015-10-19 2021-07-13 Google Llc Speech endpointing
US10269341B2 (en) 2015-10-19 2019-04-23 Google Llc Speech endpointing
US11710477B2 (en) 2015-10-19 2023-07-25 Google Llc Speech endpointing
US10714087B2 (en) * 2016-07-28 2020-07-14 Josh.ai LLC Speech control for complex commands
US10403275B1 (en) * 2016-07-28 2019-09-03 Josh.ai LLC Speech control for complex commands
US10929754B2 (en) 2017-06-06 2021-02-23 Google Llc Unified endpointer using multitask and multidomain learning
US10593352B2 (en) 2017-06-06 2020-03-17 Google Llc End of query detection
US11551709B2 (en) 2017-06-06 2023-01-10 Google Llc End of query detection
US11676625B2 (en) 2017-06-06 2023-06-13 Google Llc Unified endpointer using multitask and multidomain learning
CN109949803A (en) * 2019-02-11 2019-06-28 特斯联(北京)科技有限公司 Building service facility control method and system based on semantic instructions intelligent recognition
US11488603B2 (en) * 2019-06-06 2022-11-01 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for processing speech

Also Published As

Publication number Publication date
CN1125436C (en) 2003-10-22
GB9921422D0 (en) 1999-11-10
GB2343285A (en) 2000-05-03
JP2000132186A (en) 2000-05-12
GB2343285B (en) 2003-06-25
US6453292B2 (en) 2002-09-17
CN1252592A (en) 2000-05-10
JP3476006B2 (en) 2003-12-10

Similar Documents

Publication Publication Date Title
US6453292B2 (en) Command boundary identifier for conversational natural language
US11848001B2 (en) Systems and methods for providing non-lexical cues in synthesized speech
JP4195428B2 (en) Speech recognition using multiple speech features
US7162423B2 (en) Method and apparatus for generating and displaying N-Best alternatives in a speech recognition system
EP1447792B1 (en) Method and apparatus for modeling a speech recognition system and for predicting word error rates from text
US20020188446A1 (en) Method and apparatus for distribution-based language model adaptation
US20020120447A1 (en) Speech processing system
US6801891B2 (en) Speech processing system
US20220262352A1 (en) Improving custom keyword spotting system accuracy with text-to-speech-based data augmentation
US20020038207A1 (en) Systems and methods for word prediction and speech recognition
US20210193117A1 (en) Syllable based automatic speech recognition
US10360904B2 (en) Methods and apparatus for speech recognition using a garbage model
US7627473B2 (en) Hidden conditional random field models for phonetic classification and speech recognition
KR101988165B1 (en) Method and system for improving the accuracy of speech recognition technology based on text data analysis for deaf students
JP6810580B2 (en) Language model learning device and its program
US20050187767A1 (en) Dynamic N-best algorithm to reduce speech recognition errors
CN110853669B (en) Audio identification method, device and equipment
KR100480790B1 (en) Method and apparatus for continous speech recognition using bi-directional n-gram language model
CN112185346B (en) Multilingual voice keyword detection and model generation method and electronic equipment
Khalifa et al. Statistical modeling for speech recognition
Ramaswamy et al. Automatic identification of command boundaries in a conversational natural language user interface
CN116186201A (en) Government affair item searching method and device based on voice recognition, medium and equipment
Razik et al. Frame-synchronous and local confidence measures for on-the-fly keyword spotting
Knill et al. CUED/F-INFENG/TR 230
JP2005010464A (en) Device, method, and program for speech recognition

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RAMASWAMY, GANESH N.;KLEINDIENST, JAN;REEL/FRAME:009565/0885;SIGNING DATES FROM 19981022 TO 19981026

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:022354/0566

Effective date: 20081231

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12