US5509104A - Speech recognition employing key word modeling and non-key word modeling - Google Patents
Speech recognition employing key word modeling and non-key word modeling Download PDFInfo
- Publication number
- US5509104A US5509104A US08/132,430 US13243093A US5509104A US 5509104 A US5509104 A US 5509104A US 13243093 A US13243093 A US 13243093A US 5509104 A US5509104 A US 5509104A
- Authority
- US
- United States
- Prior art keywords
- speech recognition
- recognition system
- key
- speech
- models
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Definitions
- This invention relates to techniques for automatic recognition of speech including selected key words.
- HMM Hidden Markov Model
- a statistically-based model commonly called a Hidden Markov Model (hereinafter, HMM)
- HMM Hidden Markov Model
- Our invention is based on the grammatical concept of the above-cited Wilpon et al, reference.
- We do this by creating at least one hidden Markov model representative of extraneous speech.
- a grammar-driven continuous word recognition system is used to determine the best sequence of extraneous speech and keywords.
- sink general
- FIG. 1 snows a general flow diagram of the recognition system in which our invention can be used
- FIG. 2 shows a diagram of the state-transitional model and related parameters used used in our invention
- FIG. 3 shows the most typical grammatical sequence occurring in the practice of our invention
- FIGS. 4, 5 and 6 show curves useful in explaining the invention.
- FIG. 7 shows a more detailed flow diagram for the practice of the invention.
- s(n) a representation, derived from a speech signal.
- the speech is digitized, filtered, pre-emphasized and blocked into frames, all procedures being conventional, to produce s(n). While it is not a requirement of our invention, we have found it convenient that s(n) be analyzed to give a set of LPC-derived cepstral vectors.
- the resulting feature vectors namely, LPC and cepstrum 11, obtained using conventional processing of signal s(n) is fed into the model alignment step 13, including valid grammatical rules, where comparison of the feature vectors of s(n) is made to the two types of word reference models described briefly above, in the Summary of the Invention.
- the final best estimate, from box 14, is transmitted as the best keyword, that is, the keyword associated with the best match to the feature vectors of s(n) according to the grammar.
- the digitizing occurs at a 6.67 kHz rate and the filtered speech bandwidth is 100-3200 Hz.
- Other particular sampling rates and filter bandwiths may, of course, be used.
- the LPC and cepstral analysis 11 is then performed, following the techniques set out by L. R. Rabiner et al in the book Digital Processing of Speech Signals, Prentice Hall, Englewood Cliffs, N.J. (1978) pp. 356-372 and 398-401, and/or following the techniques set out in the paper by B. Bogeft et al, "The Quefrency Analysis of Time Series for Echoes", Proc. Symp. on Time Series Analysis, M. Rosenblatt, Ed., Ch. 15, pp. 209-243, J. Wiley, New York, 1963.
- Each frame of speech is weighted by a Hamming window, as set out at page 121 in the above-cited book by L. R. Rabiner et al.
- a p-th order, illustratively 8-th order, linear predictive coding (LPC) analysis is then performed on the data. For each frame, a set of eight LPC coefficients is generated. The resulting signal is then reduced to a sequence of LPC frame vectors, as is known in the art. It should be noted that there is no automatic endpoint detection performed on the data.
- LPC linear predictive coding
- the cepstral derivative i.e. the delta cepstrum vector
- G is a gain term so that the variances of c l (m) and ⁇ c l (m) are about the same.
- G is a gain term so that the variances of c l (m) and ⁇ c l (m) are about the same. (For our system the value of G was 0.375.)
- the overall observation vector, O l used for scoring the HMM's is the concatenation of the weighted cepstral vector, and the corresponding weighted delta cepstrum vector, i.e.
- the sequence of spectral vectors of an unknown speech utterance is matched against a set of stored word-based hidden Markov models 12 using a frame-synchronous level-building (FSLB) algorithm 13 (described in the article by C-H. Lee et al, "A Network-Based Frame Synchronous Level Building Algorithm for Connected Word Recognition," Conf. Rec. IEEE Int. Conf. Acous. Speech and Sig. Processing, Vol. 1, pp. 410-413, New York, N.Y., April 1988), with Viterbi matching within levels. Word and state duration probabilities, as will be described with reference to FIG. 2, have been incorporated into the HMM scoring and network search in the model alignment procedure 13.
- FSLB frame-synchronous level-building
- a finite state grammar describing the set of valid sentence inputs, described hereinafter with reference to FIG. 3, is used to drive the recognition process.
- the FSLB algorithm in procedure 13 performs a maximum-likelihood string decoding on a frame-by frame basis, therefore making optimally decoded partial strings available at any time.
- the output of this process is a set of valid candidate strings.
- a segmental k-means training algorithm is used, as set out in the article by L. R. Rabiner et al, "A Segmental K-means Training Procedure for Connected with Recognition Based on Whole Word Reference Patterns” AT&T Technical Journal, Vol 65, No 3, pp. 21-31, May, 1986.
- This word-building algorithm i.e. an estimation procedure for determining the parameters of the HMMs
- convergence i.e. until the difference in likelihood scores in consecutive iterations is sufficiently small.
- an HMM-based clustering algorithm is used to split previously defined clusters, see the above-cited article by Soong et at.
- This algorithm, or subsequent improvements, all based on the likelihoods obtained from HMMs separates out from the set of training tokens those tokens whose likelihood scores fall below some fixed or relative threshold. That is, we separate out all the tokens with poor likelihood scores and create a new model out of these so-called outliers tokens.
- the segmental k-means training algorithm is again used to give the optimal set of parameters for each of the models.
- FIG. 2 illustrates the structure of the HMM's used to characterize individual words as well as the background environment, including extraneous speech.
- the models are first order, left-to-right, Markov models with N states. Each model is completely specified by the following:
- state observation density matrix B b j (x) consisting of a mixture (sum) of M Gaussian densities, of the form ##EQU3## where x is the input observation vector, c mj is the mixture weight for the mth component in state j, ⁇ mj is the mean vector for mixture m in state j, and U mj is the covariance for mixture m in state j, (see the above-cited patent by Juang et al.) All evaluations described in this paper used diagonal covariance matrices. In our evaluations, the number of states per model was set to 10 and the number of mixture components per state, M, was set to nine.
- the grammar used in the recognition process of the present invention is integrated into the recognition process in the same manner as described in the above-cited Lee et al reference.
- This grammar permits the recognition of keywords in a sequence which includes any number of keywords, including zero keywords, interspersed within any number, including zero, sink (extraneous speech) models and background silence models.
- the grammar is the set of rules which define and limit the valid sequences of recognizable units.
- decision rule procedure 14 based upon a comparison of different probability scores, it is decided whether a final decision can be made, or if some alternative system procedure should be invoked.
- the sink models and background models are generated automatically, using the training procedures described above, from a large pool of extraneous speech signals. These signals contain extraneous speech as well as background signal. This will be discussed further below.
- the recognition algorithm just described relies on the ability to create a robust model of non-vocabulary background signals. Our goal is to be able to automatically generate the sink models with no user interaction.
- the simplest training procedure is to generate the sink models from specific words that occur most often in the extraneous speech. This requires that we have a labeled database indicating where such out-of-vocabulary words occur.
- the third, and fully automatic, training procedure that is proposed is to remove all labeling and segmentation constraints on the database used to train the sink model.
- the only requirement is that we have a database which contains the keywords as well as extraneous speech and background noise. Examples of such labeling can be seen in FIGS. 4 thru 6 denoted as Type 3 analysis. Even though a keyword is present in these examples, the entire utterance is used to initially train the sink model.
- FIG. 7, shows a block diagram of the training process used to obtain the final keyword and sink models. To initialize the training process, an HMM set 71 is built from the isolated vocabulary words and the pool of extraneous speech.
- the segmental k-means training algorithm is used to optimally segment the training strings into vocabulary words 75-79, silence 80 and extraneous speech. New models are then created and the process iterates itself to convergence.
- a single sink model was generated, using the fully automatic training procedure just described. Recognition results on a standard recognition task were comparable to the best results obtained from semiautomatic training procedures. This indicates that a single sink model can be generated which incorporates both the characteristics of the extraneous speech and the background silence.
- the algorithm disclosed herein based on hidden Markov model technology, which was shown capable of recognizing a pre-defined set of vocabulary items spoken in the context of fluent unconstrained speech, will allow users more freedom in their speaking manner, thereby making the human-factors issues of speech recognition more manageable.
- the grammatical constraint need not be limited to adjacency, but, instead, could require a selected relationship, such as slight overlap between the acoustic events being matched to a specific model and to a general model.
Abstract
Description
c.sub.l (m)=c.sub.l (m)·W.sub.c (m) (2),
O.sub.l ={c.sub.l (m), Δc.sub.l (m)} (4)
a.sub.ij =0 j<i,j≧i+2 (5)
Claims (22)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/132,430 US5509104A (en) | 1989-05-17 | 1993-10-06 | Speech recognition employing key word modeling and non-key word modeling |
US08/586,413 US5649057A (en) | 1989-05-17 | 1996-01-16 | Speech recognition employing key word modeling and non-key word modeling |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US35328389A | 1989-05-17 | 1989-05-17 | |
US62577390A | 1990-12-07 | 1990-12-07 | |
US97774391A | 1991-11-16 | 1991-11-16 | |
US83500692A | 1992-02-12 | 1992-02-12 | |
US08/132,430 US5509104A (en) | 1989-05-17 | 1993-10-06 | Speech recognition employing key word modeling and non-key word modeling |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US97774391A Continuation | 1989-05-17 | 1991-11-16 | |
US97774392A Continuation | 1989-05-17 | 1992-11-16 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/586,413 Division US5649057A (en) | 1989-05-17 | 1996-01-16 | Speech recognition employing key word modeling and non-key word modeling |
Publications (1)
Publication Number | Publication Date |
---|---|
US5509104A true US5509104A (en) | 1996-04-16 |
Family
ID=27502855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/132,430 Expired - Lifetime US5509104A (en) | 1989-05-17 | 1993-10-06 | Speech recognition employing key word modeling and non-key word modeling |
Country Status (1)
Country | Link |
---|---|
US (1) | US5509104A (en) |
Cited By (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5613037A (en) * | 1993-12-21 | 1997-03-18 | Lucent Technologies Inc. | Rejection of non-digit strings for connected digit speech recognition |
US5680506A (en) * | 1994-12-29 | 1997-10-21 | Lucent Technologies Inc. | Apparatus and method for speech signal analysis |
US5710864A (en) * | 1994-12-29 | 1998-01-20 | Lucent Technologies Inc. | Systems, methods and articles of manufacture for improving recognition confidence in hypothesized keywords |
US5740318A (en) * | 1994-10-18 | 1998-04-14 | Kokusai Denshin Denwa Co., Ltd. | Speech endpoint detection method and apparatus and continuous speech recognition method and apparatus |
US5774628A (en) * | 1995-04-10 | 1998-06-30 | Texas Instruments Incorporated | Speaker-independent dynamic vocabulary and grammar in speech recognition |
EP0851404A2 (en) * | 1996-12-31 | 1998-07-01 | AT&T Corp. | System and method for enhanced intelligibility of voice messages |
US5797123A (en) * | 1996-10-01 | 1998-08-18 | Lucent Technologies Inc. | Method of key-phase detection and verification for flexible speech understanding |
US5859924A (en) * | 1996-07-12 | 1999-01-12 | Robotic Vision Systems, Inc. | Method and system for measuring object features |
US5930748A (en) * | 1997-07-11 | 1999-07-27 | Motorola, Inc. | Speaker identification system and method |
US5946653A (en) * | 1997-10-01 | 1999-08-31 | Motorola, Inc. | Speaker independent speech recognition system and method |
EP0947980A1 (en) * | 1998-04-02 | 1999-10-06 | Nec Corporation | Noise-rejecting speech recognition system and method |
US5970446A (en) * | 1997-11-25 | 1999-10-19 | At&T Corp | Selective noise/channel/coding models and recognizers for automatic speech recognition |
US6006181A (en) * | 1997-09-12 | 1999-12-21 | Lucent Technologies Inc. | Method and apparatus for continuous speech recognition using a layered, self-adjusting decoder network |
WO2000005709A1 (en) * | 1998-07-23 | 2000-02-03 | Siemens Aktiengesellschaft | Method and device for recognizing predetermined key words in spoken language |
US6055498A (en) * | 1996-10-02 | 2000-04-25 | Sri International | Method and apparatus for automatic text-independent grading of pronunciation for language instruction |
US6061654A (en) * | 1996-12-16 | 2000-05-09 | At&T Corp. | System and method of recognizing letters and numbers by either speech or touch tone recognition utilizing constrained confusion matrices |
US6075883A (en) * | 1996-11-12 | 2000-06-13 | Robotic Vision Systems, Inc. | Method and system for imaging an object or pattern |
US6122612A (en) * | 1997-11-20 | 2000-09-19 | At&T Corp | Check-sum based method and apparatus for performing speech recognition |
US6137863A (en) * | 1996-12-13 | 2000-10-24 | At&T Corp. | Statistical database correction of alphanumeric account numbers for speech recognition and touch-tone recognition |
US6141661A (en) * | 1997-10-17 | 2000-10-31 | At&T Corp | Method and apparatus for performing a grammar-pruning operation |
US6154579A (en) * | 1997-08-11 | 2000-11-28 | At&T Corp. | Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
US6157731A (en) * | 1998-07-01 | 2000-12-05 | Lucent Technologies Inc. | Signature verification method using hidden markov models |
US6195634B1 (en) | 1997-12-24 | 2001-02-27 | Nortel Networks Corporation | Selection of decoys for non-vocabulary utterances rejection |
US6205428B1 (en) | 1997-11-20 | 2001-03-20 | At&T Corp. | Confusion set-base method and apparatus for pruning a predetermined arrangement of indexed identifiers |
US6205261B1 (en) | 1998-02-05 | 2001-03-20 | At&T Corp. | Confusion set based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
US6208965B1 (en) | 1997-11-20 | 2001-03-27 | At&T Corp. | Method and apparatus for performing a name acquisition based on speech recognition |
US6219453B1 (en) | 1997-08-11 | 2001-04-17 | At&T Corp. | Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm |
US6223158B1 (en) | 1998-02-04 | 2001-04-24 | At&T Corporation | Statistical option generator for alpha-numeric pre-database speech recognition correction |
US6256611B1 (en) * | 1997-07-23 | 2001-07-03 | Nokia Mobile Phones Limited | Controlling a telecommunication service and a terminal |
US20020052742A1 (en) * | 2000-07-20 | 2002-05-02 | Chris Thrasher | Method and apparatus for generating and displaying N-best alternatives in a speech recognition system |
US6389392B1 (en) * | 1997-10-15 | 2002-05-14 | British Telecommunications Public Limited Company | Method and apparatus for speaker recognition via comparing an unknown input to reference data |
US6400805B1 (en) | 1998-06-15 | 2002-06-04 | At&T Corp. | Statistical database correction of alphanumeric identifiers for speech recognition and touch-tone recognition |
US20020116377A1 (en) * | 1998-11-13 | 2002-08-22 | Jason Adelman | Methods and apparatus for operating on non-text messages |
US6442520B1 (en) | 1999-11-08 | 2002-08-27 | Agere Systems Guardian Corp. | Method and apparatus for continuous speech recognition using a layered, self-adjusting decoded network |
US6535849B1 (en) * | 2000-01-18 | 2003-03-18 | Scansoft, Inc. | Method and system for generating semi-literal transcripts for speech recognition systems |
US20030055640A1 (en) * | 2001-05-01 | 2003-03-20 | Ramot University Authority For Applied Research & Industrial Development Ltd. | System and method for parameter estimation for pattern recognition |
US6539353B1 (en) | 1999-10-12 | 2003-03-25 | Microsoft Corporation | Confidence measures using sub-word-dependent weighting of sub-word confidence scores for robust speech recognition |
US6571210B2 (en) | 1998-11-13 | 2003-05-27 | Microsoft Corporation | Confidence measure system using a near-miss pattern |
US6591237B2 (en) * | 1996-12-12 | 2003-07-08 | Intel Corporation | Keyword recognition system and method |
US20030191625A1 (en) * | 1999-11-05 | 2003-10-09 | Gorin Allen Louis | Method and system for creating a named entity language model |
US6662159B2 (en) * | 1995-11-01 | 2003-12-09 | Canon Kabushiki Kaisha | Recognizing speech data using a state transition model |
EP1378885A2 (en) * | 2002-07-03 | 2004-01-07 | Pioneer Corporation | Word-spotting apparatus, word-spotting method, and word-spotting program |
US20040030556A1 (en) * | 1999-11-12 | 2004-02-12 | Bennett Ian M. | Speech based learning/training system using semantic decoding |
US6760699B1 (en) * | 2000-04-24 | 2004-07-06 | Lucent Technologies Inc. | Soft feature decoding in a distributed automatic speech recognition system for use over wireless channels |
US20050080614A1 (en) * | 1999-11-12 | 2005-04-14 | Bennett Ian M. | System & method for natural language processing of query answers |
US20050119896A1 (en) * | 1999-11-12 | 2005-06-02 | Bennett Ian M. | Adjustable resource based speech recognition system |
US7050977B1 (en) | 1999-11-12 | 2006-05-23 | Phoenix Solutions, Inc. | Speech-enabled server for internet website and method |
US20060122834A1 (en) * | 2004-12-03 | 2006-06-08 | Bennett Ian M | Emotion detection device & method for use in distributed systems |
US20070033003A1 (en) * | 2003-07-23 | 2007-02-08 | Nexidia Inc. | Spoken word spotting queries |
KR100663821B1 (en) * | 1997-01-06 | 2007-06-04 | 텍사스 인스트루먼츠 인코포레이티드 | System and method for adding speech recognition capabilities to java |
US20070198261A1 (en) * | 2006-02-21 | 2007-08-23 | Sony Computer Entertainment Inc. | Voice recognition with parallel gender and age normalization |
US20070198263A1 (en) * | 2006-02-21 | 2007-08-23 | Sony Computer Entertainment Inc. | Voice recognition with speaker adaptation and registration with pitch |
US7263484B1 (en) | 2000-03-04 | 2007-08-28 | Georgia Tech Research Corporation | Phonetic searching |
US20080046243A1 (en) * | 1999-11-05 | 2008-02-21 | At&T Corp. | Method and system for automatic detecting morphemes in a task classification system using lattices |
US7630899B1 (en) | 1998-06-15 | 2009-12-08 | At&T Intellectual Property Ii, L.P. | Concise dynamic grammars using N-best selection |
US7698136B1 (en) * | 2003-01-28 | 2010-04-13 | Voxify, Inc. | Methods and apparatus for flexible speech recognition |
US7742918B1 (en) * | 2002-10-25 | 2010-06-22 | At&T Intellectual Property Ii, L.P. | Active learning for spoken language understanding |
US20100211376A1 (en) * | 2009-02-17 | 2010-08-19 | Sony Computer Entertainment Inc. | Multiple language voice recognition |
US20100211391A1 (en) * | 2009-02-17 | 2010-08-19 | Sony Computer Entertainment Inc. | Automatic computation streaming partition for voice recognition on multiple processors with limited memory |
US20100211387A1 (en) * | 2009-02-17 | 2010-08-19 | Sony Computer Entertainment Inc. | Speech processing with source location estimation using signals from two or more microphones |
US7970613B2 (en) | 2005-11-12 | 2011-06-28 | Sony Computer Entertainment Inc. | Method and system for Gaussian probability data bit reduction and computation |
US20110208521A1 (en) * | 2008-08-14 | 2011-08-25 | 21Ct, Inc. | Hidden Markov Model for Speech Processing with Training Method |
US20110246196A1 (en) * | 2010-03-30 | 2011-10-06 | Aspen Networks, Inc. | Integrated voice biometrics cloud security gateway |
US8392188B1 (en) | 1999-11-05 | 2013-03-05 | At&T Intellectual Property Ii, L.P. | Method and system for building a phonotactic model for domain independent speech recognition |
US20130294587A1 (en) * | 2012-05-03 | 2013-11-07 | Nexidia Inc. | Speaker adaptation |
US9118669B2 (en) | 2010-09-30 | 2015-08-25 | Alcatel Lucent | Method and apparatus for voice signature authentication |
US9153235B2 (en) | 2012-04-09 | 2015-10-06 | Sony Computer Entertainment Inc. | Text dependent speaker recognition with long-term feature based on functional data analysis |
US9767807B2 (en) | 2011-03-30 | 2017-09-19 | Ack3 Bionetics Pte Limited | Digital voice signature of transactions |
US10311874B2 (en) | 2017-09-01 | 2019-06-04 | 4Q Catalyst, LLC | Methods and systems for voice-based programming of a voice-controlled device |
CN112445897A (en) * | 2021-01-28 | 2021-03-05 | 京华信息科技股份有限公司 | Method, system, device and storage medium for large-scale classification and labeling of text data |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4481593A (en) * | 1981-10-05 | 1984-11-06 | Exxon Corporation | Continuous speech recognition |
US4713777A (en) * | 1984-05-27 | 1987-12-15 | Exxon Research And Engineering Company | Speech recognition method having noise immunity |
US4783804A (en) * | 1985-03-21 | 1988-11-08 | American Telephone And Telegraph Company, At&T Bell Laboratories | Hidden Markov model speech recognition arrangement |
US4827521A (en) * | 1986-03-27 | 1989-05-02 | International Business Machines Corporation | Training of markov models used in a speech recognition system |
US4829577A (en) * | 1986-03-25 | 1989-05-09 | International Business Machines Corporation | Speech recognition method |
US4837831A (en) * | 1986-10-15 | 1989-06-06 | Dragon Systems, Inc. | Method for creating and using multiple-word sound models in speech recognition |
US4914703A (en) * | 1986-12-05 | 1990-04-03 | Dragon Systems, Inc. | Method for deriving acoustic models for use in speech recognition |
US4977599A (en) * | 1985-05-29 | 1990-12-11 | International Business Machines Corporation | Speech recognition employing a set of Markov models that includes Markov models representing transitions to and from silence |
US5199077A (en) * | 1991-09-19 | 1993-03-30 | Xerox Corporation | Wordspotting for voice editing and indexing |
US5218668A (en) * | 1984-09-28 | 1993-06-08 | Itt Corporation | Keyword recognition system and method using template concantenation model |
US5440662A (en) * | 1992-12-11 | 1995-08-08 | At&T Corp. | Keyword/non-keyword classification in isolated word speech recognition |
US5452397A (en) * | 1992-12-11 | 1995-09-19 | Texas Instruments Incorporated | Method and system for preventing entry of confusingly similar phases in a voice recognition system vocabulary list |
-
1993
- 1993-10-06 US US08/132,430 patent/US5509104A/en not_active Expired - Lifetime
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4481593A (en) * | 1981-10-05 | 1984-11-06 | Exxon Corporation | Continuous speech recognition |
US4713777A (en) * | 1984-05-27 | 1987-12-15 | Exxon Research And Engineering Company | Speech recognition method having noise immunity |
US5218668A (en) * | 1984-09-28 | 1993-06-08 | Itt Corporation | Keyword recognition system and method using template concantenation model |
US4783804A (en) * | 1985-03-21 | 1988-11-08 | American Telephone And Telegraph Company, At&T Bell Laboratories | Hidden Markov model speech recognition arrangement |
US4977599A (en) * | 1985-05-29 | 1990-12-11 | International Business Machines Corporation | Speech recognition employing a set of Markov models that includes Markov models representing transitions to and from silence |
US4829577A (en) * | 1986-03-25 | 1989-05-09 | International Business Machines Corporation | Speech recognition method |
US4827521A (en) * | 1986-03-27 | 1989-05-02 | International Business Machines Corporation | Training of markov models used in a speech recognition system |
US4837831A (en) * | 1986-10-15 | 1989-06-06 | Dragon Systems, Inc. | Method for creating and using multiple-word sound models in speech recognition |
US4914703A (en) * | 1986-12-05 | 1990-04-03 | Dragon Systems, Inc. | Method for deriving acoustic models for use in speech recognition |
US5199077A (en) * | 1991-09-19 | 1993-03-30 | Xerox Corporation | Wordspotting for voice editing and indexing |
US5440662A (en) * | 1992-12-11 | 1995-08-08 | At&T Corp. | Keyword/non-keyword classification in isolated word speech recognition |
US5452397A (en) * | 1992-12-11 | 1995-09-19 | Texas Instruments Incorporated | Method and system for preventing entry of confusingly similar phases in a voice recognition system vocabulary list |
Non-Patent Citations (25)
Title |
---|
"A Network-Based Frame Synchronous Level Building Algorithm for Connected Work Recognition," by C-H. Lee et al., IEEE Int. Conf. Acous. Speech and Sig. Processing, vol. 1, pp. 410-413, Apr. 1988. |
"A Segmental K-means Training Procedure for Connected with Recognition Based on Whole Word Reference Patterns," by L. R. Rabiner et al., AT&T Technical Journal, vol. 65, No. 3, pp. 21-31, May 1986. |
"Application of Hidden Markov Models to Automatic Speech Endpoint Detection," by Wilpon and Rabiner, Computer Speech and Language, vol. 2, 3/4 pp. 321-341, Dec. 1987. |
"Detecting and Locating Key Words in Continuous Speech Using Linear Predictive Coding," by Christiansen and Rushforth, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP 25 No. 5, pp. 362-367. Oct. 1977. |
"High Performance Connected Digit Recognition Using Hidden Markov Models," by L. R. Rabiner et al., IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 119-122, Apr. 1988. |
"Keyword Recognition Using Template Concatenation," by Higgins and Wohlford, IEEE Int. Conf. Acous. Speech, and Signal Processing pp. 1233-1236, Mar. 1985. |
"On the Use of Instantaneous and Transitional Spectral Information in Speaker Recognition,"by F. K. Soong et al., IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP 36, No. 6, pp. 871-879, Jun. 1988. |
"The Frequency Analysis of Time Series for Echoes," Proc. Symp. on Time Series Analysis, Bo Bogert et al., Ch. 15, pp. 209-243, 1963. |
"The Use of Bandpass Filtering in Speech Recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, by B. Juang et al., ASSP 35, No. 7, pp. 947-954, Jul. 1987. |
A Network Based Frame Synchronous Level Building Algorithm for Connected Work Recognition, by C H. Lee et al., IEEE Int. Conf. Acous. Speech and Sig. Processing, vol. 1, pp. 410 413, Apr. 1988. * |
A Segmental K means Training Procedure for Connected with Recognition Based on Whole Word Reference Patterns, by L. R. Rabiner et al., AT&T Technical Journal, vol. 65, No. 3, pp. 21 31, May 1986. * |
Application of Hidden Markov Models to Automatic Speech Endpoint Detection, by Wilpon and Rabiner, Computer Speech and Language, vol. 2, 3/4 pp. 321 341, Dec. 1987. * |
Detecting and Locating Key Words in Continuous Speech Using Linear Predictive Coding, by Christiansen and Rushforth, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP 25 No. 5, pp. 362 367. Oct. 1977. * |
Digital Processing of Speech Signals, by L. P. Rabiner et al., Prentice Hall, pp. 356 372 and 398 401 (1978). * |
Digital Processing of Speech Signals, by L. P. Rabiner et al., Prentice Hall, pp. 356-372 and 398-401 (1978). |
Digital Processing of Speech Signals, by L. R. Rabiner et al., Prentice Hall, p. 121 (19789). * |
High Performance Connected Digit Recognition Using Hidden Markov Models, by L. R. Rabiner et al., IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 119 122, Apr. 1988. * |
Keyword Recognition Using Template Concatenation, by Higgins and Wohlford, IEEE Int. Conf. Acous. Speech, and Signal Processing pp. 1233 1236, Mar. 1985. * |
Markowitz, "Keyword Spotting in Speech", Al Expert, pp. 21-25. |
Markowitz, Keyword Spotting in Speech , Al Expert, pp. 21 25. * |
On the Use of Instantaneous and Transitional Spectral Information in Speaker Recognition, by F. K. Soong et al., IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP 36, No. 6, pp. 871 879, Jun. 1988. * |
The Frequency Analysis of Time Series for Echoes, Proc. Symp. on Time Series Analysis, Bo Bogert et al., Ch. 15, pp. 209 243, 1963. * |
The Use of Bandpass Filtering in Speech Recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, by B. Juang et al., ASSP 35, No. 7, pp. 947 954, Jul. 1987. * |
Wilpon et al., "Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models", IEEE Trans. on Acoustics Speech and Signal Proc., vol. 38 , No. 11, pp. 1870-1878. |
Wilpon et al., Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models , IEEE Trans. on Acoustics Speech and Signal Proc., vol. 38 , No. 11, pp. 1870 1878. * |
Cited By (148)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5613037A (en) * | 1993-12-21 | 1997-03-18 | Lucent Technologies Inc. | Rejection of non-digit strings for connected digit speech recognition |
US5740318A (en) * | 1994-10-18 | 1998-04-14 | Kokusai Denshin Denwa Co., Ltd. | Speech endpoint detection method and apparatus and continuous speech recognition method and apparatus |
US5680506A (en) * | 1994-12-29 | 1997-10-21 | Lucent Technologies Inc. | Apparatus and method for speech signal analysis |
US5710864A (en) * | 1994-12-29 | 1998-01-20 | Lucent Technologies Inc. | Systems, methods and articles of manufacture for improving recognition confidence in hypothesized keywords |
US5774628A (en) * | 1995-04-10 | 1998-06-30 | Texas Instruments Incorporated | Speaker-independent dynamic vocabulary and grammar in speech recognition |
US6662159B2 (en) * | 1995-11-01 | 2003-12-09 | Canon Kabushiki Kaisha | Recognizing speech data using a state transition model |
US5859924A (en) * | 1996-07-12 | 1999-01-12 | Robotic Vision Systems, Inc. | Method and system for measuring object features |
US5797123A (en) * | 1996-10-01 | 1998-08-18 | Lucent Technologies Inc. | Method of key-phase detection and verification for flexible speech understanding |
US6055498A (en) * | 1996-10-02 | 2000-04-25 | Sri International | Method and apparatus for automatic text-independent grading of pronunciation for language instruction |
US6226611B1 (en) | 1996-10-02 | 2001-05-01 | Sri International | Method and system for automatic text-independent grading of pronunciation for language instruction |
US6075883A (en) * | 1996-11-12 | 2000-06-13 | Robotic Vision Systems, Inc. | Method and system for imaging an object or pattern |
US6603874B1 (en) | 1996-11-12 | 2003-08-05 | Robotic Vision Systems, Inc. | Method and system for imaging an object or pattern |
US20030215127A1 (en) * | 1996-11-12 | 2003-11-20 | Howard Stern | Method and system for imaging an object or pattern |
US6591237B2 (en) * | 1996-12-12 | 2003-07-08 | Intel Corporation | Keyword recognition system and method |
US6137863A (en) * | 1996-12-13 | 2000-10-24 | At&T Corp. | Statistical database correction of alphanumeric account numbers for speech recognition and touch-tone recognition |
US6061654A (en) * | 1996-12-16 | 2000-05-09 | At&T Corp. | System and method of recognizing letters and numbers by either speech or touch tone recognition utilizing constrained confusion matrices |
EP0851404A3 (en) * | 1996-12-31 | 1998-12-30 | AT&T Corp. | System and method for enhanced intelligibility of voice messages |
US5848130A (en) * | 1996-12-31 | 1998-12-08 | At&T Corp | System and method for enhanced intelligibility of voice messages |
EP0851404A2 (en) * | 1996-12-31 | 1998-07-01 | AT&T Corp. | System and method for enhanced intelligibility of voice messages |
KR100663821B1 (en) * | 1997-01-06 | 2007-06-04 | 텍사스 인스트루먼츠 인코포레이티드 | System and method for adding speech recognition capabilities to java |
US5930748A (en) * | 1997-07-11 | 1999-07-27 | Motorola, Inc. | Speaker identification system and method |
US6256611B1 (en) * | 1997-07-23 | 2001-07-03 | Nokia Mobile Phones Limited | Controlling a telecommunication service and a terminal |
US6154579A (en) * | 1997-08-11 | 2000-11-28 | At&T Corp. | Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
US6219453B1 (en) | 1997-08-11 | 2001-04-17 | At&T Corp. | Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm |
US6006181A (en) * | 1997-09-12 | 1999-12-21 | Lucent Technologies Inc. | Method and apparatus for continuous speech recognition using a layered, self-adjusting decoder network |
US5946653A (en) * | 1997-10-01 | 1999-08-31 | Motorola, Inc. | Speaker independent speech recognition system and method |
US6389392B1 (en) * | 1997-10-15 | 2002-05-14 | British Telecommunications Public Limited Company | Method and apparatus for speaker recognition via comparing an unknown input to reference data |
US6141661A (en) * | 1997-10-17 | 2000-10-31 | At&T Corp | Method and apparatus for performing a grammar-pruning operation |
US6208965B1 (en) | 1997-11-20 | 2001-03-27 | At&T Corp. | Method and apparatus for performing a name acquisition based on speech recognition |
US6205428B1 (en) | 1997-11-20 | 2001-03-20 | At&T Corp. | Confusion set-base method and apparatus for pruning a predetermined arrangement of indexed identifiers |
US6122612A (en) * | 1997-11-20 | 2000-09-19 | At&T Corp | Check-sum based method and apparatus for performing speech recognition |
USRE45289E1 (en) | 1997-11-25 | 2014-12-09 | At&T Intellectual Property Ii, L.P. | Selective noise/channel/coding models and recognizers for automatic speech recognition |
US5970446A (en) * | 1997-11-25 | 1999-10-19 | At&T Corp | Selective noise/channel/coding models and recognizers for automatic speech recognition |
US6195634B1 (en) | 1997-12-24 | 2001-02-27 | Nortel Networks Corporation | Selection of decoys for non-vocabulary utterances rejection |
US6223158B1 (en) | 1998-02-04 | 2001-04-24 | At&T Corporation | Statistical option generator for alpha-numeric pre-database speech recognition correction |
US6205261B1 (en) | 1998-02-05 | 2001-03-20 | At&T Corp. | Confusion set based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
EP0947980A1 (en) * | 1998-04-02 | 1999-10-06 | Nec Corporation | Noise-rejecting speech recognition system and method |
US7937260B1 (en) | 1998-06-15 | 2011-05-03 | At&T Intellectual Property Ii, L.P. | Concise dynamic grammars using N-best selection |
US20110202343A1 (en) * | 1998-06-15 | 2011-08-18 | At&T Intellectual Property I, L.P. | Concise dynamic grammars using n-best selection |
US9286887B2 (en) | 1998-06-15 | 2016-03-15 | At&T Intellectual Property Ii, L.P. | Concise dynamic grammars using N-best selection |
US8682665B2 (en) | 1998-06-15 | 2014-03-25 | At&T Intellectual Property Ii, L.P. | Concise dynamic grammars using N-best selection |
US7630899B1 (en) | 1998-06-15 | 2009-12-08 | At&T Intellectual Property Ii, L.P. | Concise dynamic grammars using N-best selection |
US6400805B1 (en) | 1998-06-15 | 2002-06-04 | At&T Corp. | Statistical database correction of alphanumeric identifiers for speech recognition and touch-tone recognition |
US6157731A (en) * | 1998-07-01 | 2000-12-05 | Lucent Technologies Inc. | Signature verification method using hidden markov models |
WO2000005709A1 (en) * | 1998-07-23 | 2000-02-03 | Siemens Aktiengesellschaft | Method and device for recognizing predetermined key words in spoken language |
US20020116377A1 (en) * | 1998-11-13 | 2002-08-22 | Jason Adelman | Methods and apparatus for operating on non-text messages |
US7685102B2 (en) * | 1998-11-13 | 2010-03-23 | Avaya Inc. | Methods and apparatus for operating on non-text messages |
US6571210B2 (en) | 1998-11-13 | 2003-05-27 | Microsoft Corporation | Confidence measure system using a near-miss pattern |
US6539353B1 (en) | 1999-10-12 | 2003-03-25 | Microsoft Corporation | Confidence measures using sub-word-dependent weighting of sub-word confidence scores for robust speech recognition |
US20080046243A1 (en) * | 1999-11-05 | 2008-02-21 | At&T Corp. | Method and system for automatic detecting morphemes in a task classification system using lattices |
US8909529B2 (en) | 1999-11-05 | 2014-12-09 | At&T Intellectual Property Ii, L.P. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US20080288244A1 (en) * | 1999-11-05 | 2008-11-20 | At&T Corp. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US7440897B1 (en) | 1999-11-05 | 2008-10-21 | At&T Corp. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US20080177544A1 (en) * | 1999-11-05 | 2008-07-24 | At&T Corp. | Method and system for automatic detecting morphemes in a task classification system using lattices |
US7620548B2 (en) | 1999-11-05 | 2009-11-17 | At&T Intellectual Property Ii, L.P. | Method and system for automatic detecting morphemes in a task classification system using lattices |
US20030191625A1 (en) * | 1999-11-05 | 2003-10-09 | Gorin Allen Louis | Method and system for creating a named entity language model |
US8392188B1 (en) | 1999-11-05 | 2013-03-05 | At&T Intellectual Property Ii, L.P. | Method and system for building a phonotactic model for domain independent speech recognition |
US8010361B2 (en) | 1999-11-05 | 2011-08-30 | At&T Intellectual Property Ii, L.P. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US8200491B2 (en) | 1999-11-05 | 2012-06-12 | At&T Intellectual Property Ii, L.P. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US8612212B2 (en) | 1999-11-05 | 2013-12-17 | At&T Intellectual Property Ii, L.P. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US9514126B2 (en) | 1999-11-05 | 2016-12-06 | At&T Intellectual Property Ii, L.P. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US6442520B1 (en) | 1999-11-08 | 2002-08-27 | Agere Systems Guardian Corp. | Method and apparatus for continuous speech recognition using a layered, self-adjusting decoded network |
US20070185717A1 (en) * | 1999-11-12 | 2007-08-09 | Bennett Ian M | Method of interacting through speech with a web-connected server |
US20080255845A1 (en) * | 1999-11-12 | 2008-10-16 | Bennett Ian M | Speech Based Query System Using Semantic Decoding |
US8352277B2 (en) | 1999-11-12 | 2013-01-08 | Phoenix Solutions, Inc. | Method of interacting through speech with a web-connected server |
US7203646B2 (en) | 1999-11-12 | 2007-04-10 | Phoenix Solutions, Inc. | Distributed internet based speech recognition system with natural language support |
US7225125B2 (en) | 1999-11-12 | 2007-05-29 | Phoenix Solutions, Inc. | Speech recognition system trained with regional speech characteristics |
US20060235696A1 (en) * | 1999-11-12 | 2006-10-19 | Bennett Ian M | Network based interactive speech recognition system |
US20070179789A1 (en) * | 1999-11-12 | 2007-08-02 | Bennett Ian M | Speech Recognition System With Support For Variable Portable Devices |
US20060200353A1 (en) * | 1999-11-12 | 2006-09-07 | Bennett Ian M | Distributed Internet Based Speech Recognition System With Natural Language Support |
US8229734B2 (en) | 1999-11-12 | 2012-07-24 | Phoenix Solutions, Inc. | Semantic decoding of user queries |
US7050977B1 (en) | 1999-11-12 | 2006-05-23 | Phoenix Solutions, Inc. | Speech-enabled server for internet website and method |
US20050144001A1 (en) * | 1999-11-12 | 2005-06-30 | Bennett Ian M. | Speech recognition system trained with regional speech characteristics |
US7277854B2 (en) | 1999-11-12 | 2007-10-02 | Phoenix Solutions, Inc | Speech recognition system interactive agent |
US20050144004A1 (en) * | 1999-11-12 | 2005-06-30 | Bennett Ian M. | Speech recognition system interactive agent |
US7725307B2 (en) | 1999-11-12 | 2010-05-25 | Phoenix Solutions, Inc. | Query engine for processing voice based queries including semantic decoding |
US20050119896A1 (en) * | 1999-11-12 | 2005-06-02 | Bennett Ian M. | Adjustable resource based speech recognition system |
US20080052078A1 (en) * | 1999-11-12 | 2008-02-28 | Bennett Ian M | Statistical Language Model Trained With Semantic Variants |
US7376556B2 (en) | 1999-11-12 | 2008-05-20 | Phoenix Solutions, Inc. | Method for processing speech signal features for streaming transport |
US7392185B2 (en) | 1999-11-12 | 2008-06-24 | Phoenix Solutions, Inc. | Speech based learning/training system using semantic decoding |
US20050086059A1 (en) * | 1999-11-12 | 2005-04-21 | Bennett Ian M. | Partial speech processing device & method for use in distributed systems |
US7912702B2 (en) | 1999-11-12 | 2011-03-22 | Phoenix Solutions, Inc. | Statistical language model trained with semantic variants |
US20080215327A1 (en) * | 1999-11-12 | 2008-09-04 | Bennett Ian M | Method For Processing Speech Data For A Distributed Recognition System |
US7139714B2 (en) | 1999-11-12 | 2006-11-21 | Phoenix Solutions, Inc. | Adjustable resource based speech recognition system |
US20050080614A1 (en) * | 1999-11-12 | 2005-04-14 | Bennett Ian M. | System & method for natural language processing of query answers |
US8762152B2 (en) | 1999-11-12 | 2014-06-24 | Nuance Communications, Inc. | Speech recognition system interactive agent |
US20080300878A1 (en) * | 1999-11-12 | 2008-12-04 | Bennett Ian M | Method For Transporting Speech Data For A Distributed Recognition System |
US7873519B2 (en) | 1999-11-12 | 2011-01-18 | Phoenix Solutions, Inc. | Natural language speech lattice containing semantic variants |
US7831426B2 (en) | 1999-11-12 | 2010-11-09 | Phoenix Solutions, Inc. | Network based interactive speech recognition system |
US20090157401A1 (en) * | 1999-11-12 | 2009-06-18 | Bennett Ian M | Semantic Decoding of User Queries |
US7555431B2 (en) | 1999-11-12 | 2009-06-30 | Phoenix Solutions, Inc. | Method for processing speech using dynamic grammars |
US20040030556A1 (en) * | 1999-11-12 | 2004-02-12 | Bennett Ian M. | Speech based learning/training system using semantic decoding |
US7624007B2 (en) | 1999-11-12 | 2009-11-24 | Phoenix Solutions, Inc. | System and method for natural language processing of sentence based queries |
US9076448B2 (en) | 1999-11-12 | 2015-07-07 | Nuance Communications, Inc. | Distributed real time speech recognition system |
US7647225B2 (en) | 1999-11-12 | 2010-01-12 | Phoenix Solutions, Inc. | Adjustable resource based speech recognition system |
US7657424B2 (en) | 1999-11-12 | 2010-02-02 | Phoenix Solutions, Inc. | System and method for processing sentence based queries |
US7672841B2 (en) | 1999-11-12 | 2010-03-02 | Phoenix Solutions, Inc. | Method for processing speech data for a distributed recognition system |
US9190063B2 (en) | 1999-11-12 | 2015-11-17 | Nuance Communications, Inc. | Multi-language speech recognition system |
US7729904B2 (en) | 1999-11-12 | 2010-06-01 | Phoenix Solutions, Inc. | Partial speech processing device and method for use in distributed systems |
US7698131B2 (en) | 1999-11-12 | 2010-04-13 | Phoenix Solutions, Inc. | Speech recognition system for client devices having differing computing capabilities |
US7702508B2 (en) | 1999-11-12 | 2010-04-20 | Phoenix Solutions, Inc. | System and method for natural language processing of query answers |
US7725320B2 (en) | 1999-11-12 | 2010-05-25 | Phoenix Solutions, Inc. | Internet based speech recognition system with dynamic grammars |
US7725321B2 (en) | 1999-11-12 | 2010-05-25 | Phoenix Solutions, Inc. | Speech based query system using semantic decoding |
US6535849B1 (en) * | 2000-01-18 | 2003-03-18 | Scansoft, Inc. | Method and system for generating semi-literal transcripts for speech recognition systems |
US7475065B1 (en) | 2000-03-04 | 2009-01-06 | Georgia Tech Research Corporation | Phonetic searching |
US7769587B2 (en) | 2000-03-04 | 2010-08-03 | Georgia Tech Research Corporation | Phonetic searching |
US7263484B1 (en) | 2000-03-04 | 2007-08-28 | Georgia Tech Research Corporation | Phonetic searching |
US7313521B1 (en) | 2000-03-04 | 2007-12-25 | Georgia Tech Research Corporation | Phonetic searching |
US7324939B1 (en) | 2000-03-04 | 2008-01-29 | Georgia Tech Research Corporation | Phonetic searching |
US7406415B1 (en) | 2000-03-04 | 2008-07-29 | Georgia Tech Research Corporation | Phonetic searching |
US20090083033A1 (en) * | 2000-03-04 | 2009-03-26 | Georgia Tech Research Corporation | Phonetic Searching |
US6760699B1 (en) * | 2000-04-24 | 2004-07-06 | Lucent Technologies Inc. | Soft feature decoding in a distributed automatic speech recognition system for use over wireless channels |
US7162423B2 (en) | 2000-07-20 | 2007-01-09 | Microsoft Corporation | Method and apparatus for generating and displaying N-Best alternatives in a speech recognition system |
US6856956B2 (en) | 2000-07-20 | 2005-02-15 | Microsoft Corporation | Method and apparatus for generating and displaying N-best alternatives in a speech recognition system |
US20050091054A1 (en) * | 2000-07-20 | 2005-04-28 | Microsoft Corporation | Method and apparatus for generating and displaying N-Best alternatives in a speech recognition system |
US20020052742A1 (en) * | 2000-07-20 | 2002-05-02 | Chris Thrasher | Method and apparatus for generating and displaying N-best alternatives in a speech recognition system |
US20030055640A1 (en) * | 2001-05-01 | 2003-03-20 | Ramot University Authority For Applied Research & Industrial Development Ltd. | System and method for parameter estimation for pattern recognition |
EP1378885A2 (en) * | 2002-07-03 | 2004-01-07 | Pioneer Corporation | Word-spotting apparatus, word-spotting method, and word-spotting program |
US20040006470A1 (en) * | 2002-07-03 | 2004-01-08 | Pioneer Corporation | Word-spotting apparatus, word-spotting method, and word-spotting program |
EP1378885A3 (en) * | 2002-07-03 | 2004-05-26 | Pioneer Corporation | Word-spotting apparatus, word-spotting method, and word-spotting program |
US7742918B1 (en) * | 2002-10-25 | 2010-06-22 | At&T Intellectual Property Ii, L.P. | Active learning for spoken language understanding |
US7698136B1 (en) * | 2003-01-28 | 2010-04-13 | Voxify, Inc. | Methods and apparatus for flexible speech recognition |
US20070033003A1 (en) * | 2003-07-23 | 2007-02-08 | Nexidia Inc. | Spoken word spotting queries |
US7904296B2 (en) * | 2003-07-23 | 2011-03-08 | Nexidia Inc. | Spoken word spotting queries |
US20060122834A1 (en) * | 2004-12-03 | 2006-06-08 | Bennett Ian M | Emotion detection device & method for use in distributed systems |
US7970613B2 (en) | 2005-11-12 | 2011-06-28 | Sony Computer Entertainment Inc. | Method and system for Gaussian probability data bit reduction and computation |
US7778831B2 (en) | 2006-02-21 | 2010-08-17 | Sony Computer Entertainment Inc. | Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch |
US8010358B2 (en) | 2006-02-21 | 2011-08-30 | Sony Computer Entertainment Inc. | Voice recognition with parallel gender and age normalization |
US8050922B2 (en) | 2006-02-21 | 2011-11-01 | Sony Computer Entertainment Inc. | Voice recognition with dynamic filter bank adjustment based on speaker categorization |
US20070198263A1 (en) * | 2006-02-21 | 2007-08-23 | Sony Computer Entertainment Inc. | Voice recognition with speaker adaptation and registration with pitch |
US20070198261A1 (en) * | 2006-02-21 | 2007-08-23 | Sony Computer Entertainment Inc. | Voice recognition with parallel gender and age normalization |
US20110208521A1 (en) * | 2008-08-14 | 2011-08-25 | 21Ct, Inc. | Hidden Markov Model for Speech Processing with Training Method |
US9020816B2 (en) | 2008-08-14 | 2015-04-28 | 21Ct, Inc. | Hidden markov model for speech processing with training method |
US8442833B2 (en) | 2009-02-17 | 2013-05-14 | Sony Computer Entertainment Inc. | Speech processing with source location estimation using signals from two or more microphones |
US8442829B2 (en) | 2009-02-17 | 2013-05-14 | Sony Computer Entertainment Inc. | Automatic computation streaming partition for voice recognition on multiple processors with limited memory |
US8788256B2 (en) | 2009-02-17 | 2014-07-22 | Sony Computer Entertainment Inc. | Multiple language voice recognition |
US20100211376A1 (en) * | 2009-02-17 | 2010-08-19 | Sony Computer Entertainment Inc. | Multiple language voice recognition |
US20100211391A1 (en) * | 2009-02-17 | 2010-08-19 | Sony Computer Entertainment Inc. | Automatic computation streaming partition for voice recognition on multiple processors with limited memory |
US20100211387A1 (en) * | 2009-02-17 | 2010-08-19 | Sony Computer Entertainment Inc. | Speech processing with source location estimation using signals from two or more microphones |
US20110246196A1 (en) * | 2010-03-30 | 2011-10-06 | Aspen Networks, Inc. | Integrated voice biometrics cloud security gateway |
US9412381B2 (en) * | 2010-03-30 | 2016-08-09 | Ack3 Bionetics Private Ltd. | Integrated voice biometrics cloud security gateway |
US9118669B2 (en) | 2010-09-30 | 2015-08-25 | Alcatel Lucent | Method and apparatus for voice signature authentication |
US9767807B2 (en) | 2011-03-30 | 2017-09-19 | Ack3 Bionetics Pte Limited | Digital voice signature of transactions |
US9153235B2 (en) | 2012-04-09 | 2015-10-06 | Sony Computer Entertainment Inc. | Text dependent speaker recognition with long-term feature based on functional data analysis |
US9001976B2 (en) * | 2012-05-03 | 2015-04-07 | Nexidia, Inc. | Speaker adaptation |
US20130294587A1 (en) * | 2012-05-03 | 2013-11-07 | Nexidia Inc. | Speaker adaptation |
US10311874B2 (en) | 2017-09-01 | 2019-06-04 | 4Q Catalyst, LLC | Methods and systems for voice-based programming of a voice-controlled device |
CN112445897A (en) * | 2021-01-28 | 2021-03-05 | 京华信息科技股份有限公司 | Method, system, device and storage medium for large-scale classification and labeling of text data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5509104A (en) | Speech recognition employing key word modeling and non-key word modeling | |
US5649057A (en) | Speech recognition employing key word modeling and non-key word modeling | |
Wilpon et al. | Automatic recognition of keywords in unconstrained speech using hidden Markov models | |
US5199077A (en) | Wordspotting for voice editing and indexing | |
Li et al. | Robust endpoint detection and energy normalization for real-time speech and speaker recognition | |
US5390278A (en) | Phoneme based speech recognition | |
JP4141495B2 (en) | Method and apparatus for speech recognition using optimized partial probability mixture sharing | |
KR101120716B1 (en) | Automatic identification of telephone callers based on voice characteristics | |
Wilpon et al. | Application of hidden Markov models for recognition of a limited set of words in unconstrained speech | |
JPH06214587A (en) | Predesignated word spotting subsystem and previous word spotting method | |
US7617104B2 (en) | Method of speech recognition using hidden trajectory Hidden Markov Models | |
EP1385147A2 (en) | Method of speech recognition using time-dependent interpolation and hidden dynamic value classes | |
JPH09212188A (en) | Voice recognition method using decoded state group having conditional likelihood | |
Boite et al. | A new approach towards keyword spotting. | |
Deligne et al. | Inference of variable-length acoustic units for continuous speech recognition | |
Li | A detection approach to search-space reduction for HMM state alignment in speaker verification | |
Steinbiss et al. | Continuous speech dictation—From theory to practice | |
Marcus | A novel algorithm for HMM word spotting performance evaluation and error analysis | |
JP2731133B2 (en) | Continuous speech recognition device | |
Fakotakis et al. | A continuous HMM text-independent speaker recognition system based on vowel spotting. | |
D'Orta et al. | A speech recognition system for the Italian language | |
KR100304788B1 (en) | Method for telephone number information using continuous speech recognition | |
JP2986703B2 (en) | Voice recognition device | |
Lleida Solano et al. | Telemaco-a real time keyword spotting application for voice dialing | |
Pawate et al. | A new method for segmenting continuous speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, AS COLLATERAL AGENT, TEXAS Free format text: SECURITY AGREEMENT;ASSIGNOR:LUCENT TECHNOLOGIES INC.;REEL/FRAME:014402/0797 Effective date: 20030528 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: LUCENT TECHNOLOGIES INC., NEW JERSEY Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A. (FORMERLY KNOWN AS THE CHASE MANHATTAN BANK), AS ADMINISTRATIVE AGENT;REEL/FRAME:018590/0832 Effective date: 20061130 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:ALCATEL-LUCENT USA INC.;REEL/FRAME:030510/0627 Effective date: 20130130 |
|
AS | Assignment |
Owner name: ALCATEL-LUCENT USA INC., NEW JERSEY Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG;REEL/FRAME:033950/0261 Effective date: 20140819 |