US6629072B1 - Method of an arrangement for speech recognition with speech velocity adaptation - Google Patents
Method of an arrangement for speech recognition with speech velocity adaptation Download PDFInfo
- Publication number
- US6629072B1 US6629072B1 US09/649,675 US64967500A US6629072B1 US 6629072 B1 US6629072 B1 US 6629072B1 US 64967500 A US64967500 A US 64967500A US 6629072 B1 US6629072 B1 US 6629072B1
- Authority
- US
- United States
- Prior art keywords
- speech
- velocity
- recognition
- speech velocity
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime, expires
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
Definitions
- the invention relates to a method of and an arrangement for speech recognition.
- Speech recognition For the execution of speech recognition it is necessary for a user to supply respective speech utterances to a speech recognizer. There is a plurality of criteria for this, which influence the quality of the recognition result produced from the speech utterance. A user is often not aware of the criterions of such a speech recognition device. Only the experienced user of a speech recognition device is successful in keeping the error rate in the recognition process so low that an acceptable result is achieved. Speech recognition devices have been developed such that also different speakers can produce speech utterances, which are then recognized by the speech recognition system. Such speech recognizers are denoted as speaker-independent speech recognition systems.
- WO 87/07460 describes a telephone-based speech recognition system, in which the speech recognizer informs a user that no appropriate word was found in the vocabulary. The user is requested to repeat the speech utterance. When the speech recognition system is supplied with a too low-voiced or disturbed speech utterance, the user of the speech recognition system is requested to speak into the microphone at a higher voice.
- each speech recognizer is based on an acoustic model, which in turn is based on an average speech velocity, is not taken into account.
- a deviation of the user's speech velocity from the average speech velocity of the acoustic model considerably increases the error rate during the recognition process.
- the object is achieved in that the speech velocity is measured, which the user is informed of.
- a user For executing speech recognition, a user produces a speech utterance with an appropriate velocity.
- Acoustic speech models which are based on an average velocity, are used for the recognition process. It is necessary to make an adaptation for a deviating speech velocity. With smaller deviations of the speech velocity from the average speech velocity of the acoustic model, an adaptation is possible, but leads to a degraded evaluation of hypotheses in the recognition process as a result of the time distortion resulting from the adaptation. With larger deviations from the average speech velocity, an adaptation is no longer possible, because the models cannot be run through with optionally high velocity. Furthermore, when most users speak fast, they often tend to swallow short words or word endings. Such errors cannot be reliably recognized by the speech recognition system.
- the user's speech velocity is measured and the user is informed thereof via an output means.
- the user is then persuaded to stick to an optimized speech velocity, which is oriented to the average speech velocity of the speech model.
- Output means are, for example, LEDs for which a respective color (green) shows an acceptable speech velocity and another color (red) shows an unacceptable deviation.
- Another possibility is the display of a number value by which the user is informed of the range in which this number value is to lie. With an exclusively audio-based communication, the user is informed by means of a warning signal that the speech velocity lies outside an acceptable range.
- the measured speech velocity can advantageously also be applied to the speech recognizer as a confidence measure or for controlling the search process.
- the speech recognizer is then informed of a measure according to which the speech recognizer can decide whether the speech velocity lies within respective limits, or whether a too high speech velocity is to be taken into account during the recognition process. The same holds for a too low speech velocity.
- the speech velocity is determined by means of a suitable measure.
- a speech recognition system is to be trained before it can perform a recognition. Therefore, it is important to take the speech velocity into account and announce this to the user already during the training.
- a measure for the speech velocity is, for example, the number of spoken and also recognized words per time unit. However, more accurate is the measurement of the recognized phonemes per frame where the frame is considered to be a predefined time interval.
- the announcement whether the speech velocity lies in the acceptable range may be linked with a transgression of an experimentally determined threshold value. Consequently, the user is then only informed when the speech velocity is too high, so that he is not distracted by the information—speech velocity lies in the acceptable range.
- the threshold value may also be determined during the recognition process in that respective measures are transgressed or fallen short of during the recognition process.
- a particular advantage of this invention is obtained from the learning process, which a user is to undergo.
- the user makes a great effort to attain a high efficiency when using a speech recognition system. Since his speech velocity is displayed relative to the average speech velocity of the acoustic model, he consequently learns to adapt his speech velocity and thereby achieves a low error rate.
- the object of the invention is achieved by a speech recognition device in which a measuring unit determines a speech velocity and informs the user thereof by means of an output unit.
- FIG. 1 shows a speech recognition device.
- the speech utterances are produced by a user or speaker and fed to a microphone 6 .
- These analog speech data are applied to an input unit 5 , which performs a digitization of the speech data.
- These digital speech data are then applied to the speech recognizer 1 .
- There the speech recognition process is carried out.
- the measuring unit 2 is supplied with the recognized phonemes and/or words. By means of the recognized phonemes and/or words, the speech velocity is determined in the measuring unit 2 .
- the measuring unit 2 is connected to the output unit 3 and the speech recognizer 1 .
- the measured speech velocity is applied to the output unit 3 and to the speech recognizer 1 as the case may be.
- the speech recognizer 1 applies the recognition result, for example, to an interface unit 4 for a text-processing program, which shows the recognized words on a monitor.
- An acoustic model is available to the speech recognizer 1 for the recognition result to be generated.
- This acoustic model is based on an average speech velocity. This average speech velocity is compared with the measured speech velocity either in the speech recognizer 1 , or directly applied by the speech recognizer 1 to the output unit 3 for a direct comparison with the measured speech velocity in the output unit 3 .
- a respective time unit is used as a base in the measuring unit.
- phonemes are measured per time unit, phonemes are counted per frame and with words are counted per second.
- the type of measurement used for measuring the speech velocity depends on the application.
Abstract
Description
Claims (11)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19941227 | 1999-08-30 | ||
DE19941227A DE19941227A1 (en) | 1999-08-30 | 1999-08-30 | Method and arrangement for speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
US6629072B1 true US6629072B1 (en) | 2003-09-30 |
Family
ID=7920161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/649,675 Expired - Lifetime US6629072B1 (en) | 1999-08-30 | 2000-08-28 | Method of an arrangement for speech recognition with speech velocity adaptation |
Country Status (4)
Country | Link |
---|---|
US (1) | US6629072B1 (en) |
EP (1) | EP1081683B1 (en) |
JP (1) | JP2001100790A (en) |
DE (2) | DE19941227A1 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030163306A1 (en) * | 2002-02-28 | 2003-08-28 | Ntt Docomo, Inc. | Information recognition device and information recognition method |
US20040176953A1 (en) * | 2002-10-24 | 2004-09-09 | International Business Machines Corporation | Method and apparatus for a interactive voice response system |
US20060178882A1 (en) * | 2005-02-04 | 2006-08-10 | Vocollect, Inc. | Method and system for considering information about an expected response when performing speech recognition |
US7167544B1 (en) * | 1999-11-25 | 2007-01-23 | Siemens Aktiengesellschaft | Telecommunication system with error messages corresponding to speech recognition errors |
US20070192101A1 (en) * | 2005-02-04 | 2007-08-16 | Keith Braho | Methods and systems for optimizing model adaptation for a speech recognition system |
US20110029313A1 (en) * | 2005-02-04 | 2011-02-03 | Vocollect, Inc. | Methods and systems for adapting a model for a speech recognition system |
US7949533B2 (en) | 2005-02-04 | 2011-05-24 | Vococollect, Inc. | Methods and systems for assessing and improving the performance of a speech recognition system |
US20110208525A1 (en) * | 2007-07-02 | 2011-08-25 | Yuzuru Inoue | Voice recognizing apparatus |
US8200495B2 (en) | 2005-02-04 | 2012-06-12 | Vocollect, Inc. | Methods and systems for considering information about an expected response when performing speech recognition |
US20140142943A1 (en) * | 2012-11-22 | 2014-05-22 | Fujitsu Limited | Signal processing device, method for processing signal |
US8914290B2 (en) | 2011-05-20 | 2014-12-16 | Vocollect, Inc. | Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment |
US9978395B2 (en) | 2013-03-15 | 2018-05-22 | Vocollect, Inc. | Method and system for mitigating delay in receiving audio stream during production of sound from audio stream |
CN111179939A (en) * | 2020-04-13 | 2020-05-19 | 北京海天瑞声科技股份有限公司 | Voice transcription method, voice transcription device and computer storage medium |
WO2021134549A1 (en) * | 2019-12-31 | 2021-07-08 | 李庆远 | Human merging and training of multiple artificial intelligence outputs |
US11837253B2 (en) | 2016-07-27 | 2023-12-05 | Vocollect, Inc. | Distinguishing user speech from background speech in speech-dense environments |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7366667B2 (en) | 2001-12-21 | 2008-04-29 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and device for pause limit values in speech recognition |
KR100834679B1 (en) * | 2006-10-31 | 2008-06-02 | 삼성전자주식회사 | Method and apparatus for alarming of speech-recognition error |
DE102011121110A1 (en) * | 2011-12-14 | 2013-06-20 | Volkswagen Aktiengesellschaft | Method for operating voice dialog system in vehicle, involves determining system status of voice dialog system, assigning color code to determined system status, and visualizing system status visualized in color according to color code |
CN112037775B (en) * | 2020-09-08 | 2021-09-14 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method, device, equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2102171A (en) | 1981-06-24 | 1983-01-26 | John Graham Parkhouse | Speech aiding apparatus and method |
JPS59216242A (en) | 1983-05-25 | 1984-12-06 | Toshiba Corp | Voice recognizing response device |
WO1987007460A1 (en) | 1986-05-23 | 1987-12-03 | Devices Innovative | Voice activated telephone |
US5687288A (en) | 1994-09-20 | 1997-11-11 | U.S. Philips Corporation | System with speaking-rate-adaptive transition values for determining words from a speech signal |
US5870709A (en) * | 1995-12-04 | 1999-02-09 | Ordinate Corporation | Method and apparatus for combining information from speech signals for adaptive interaction in teaching and testing |
US6006175A (en) * | 1996-02-06 | 1999-12-21 | The Regents Of The University Of California | Methods and apparatus for non-acoustic speech characterization and recognition |
US6226615B1 (en) * | 1997-08-06 | 2001-05-01 | British Broadcasting Corporation | Spoken text display method and apparatus, for use in generating television signals |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS59176794A (en) * | 1983-03-25 | 1984-10-06 | シャープ株式会社 | Word voice recognition equipment |
JPS60237495A (en) * | 1984-05-09 | 1985-11-26 | シャープ株式会社 | Voice recognition equipment |
JPH067346B2 (en) * | 1984-08-14 | 1994-01-26 | シャープ株式会社 | Voice recognizer |
JPS62294298A (en) * | 1986-06-13 | 1987-12-21 | 松下電器産業株式会社 | Voice input unit |
JPS63161499A (en) * | 1986-12-24 | 1988-07-05 | 松下電器産業株式会社 | Voice recognition equipment |
JPH0434499A (en) * | 1990-05-30 | 1992-02-05 | Sharp Corp | Vocalization indicating method |
JPH07295588A (en) * | 1994-04-21 | 1995-11-10 | Nippon Hoso Kyokai <Nhk> | Estimating method for speed of utterance |
-
1999
- 1999-08-30 DE DE19941227A patent/DE19941227A1/en not_active Withdrawn
-
2000
- 2000-08-28 US US09/649,675 patent/US6629072B1/en not_active Expired - Lifetime
- 2000-08-29 JP JP2000258958A patent/JP2001100790A/en active Pending
- 2000-08-29 DE DE50015670T patent/DE50015670D1/en not_active Expired - Lifetime
- 2000-08-29 EP EP00202999A patent/EP1081683B1/en not_active Expired - Lifetime
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2102171A (en) | 1981-06-24 | 1983-01-26 | John Graham Parkhouse | Speech aiding apparatus and method |
JPS59216242A (en) | 1983-05-25 | 1984-12-06 | Toshiba Corp | Voice recognizing response device |
WO1987007460A1 (en) | 1986-05-23 | 1987-12-03 | Devices Innovative | Voice activated telephone |
US5687288A (en) | 1994-09-20 | 1997-11-11 | U.S. Philips Corporation | System with speaking-rate-adaptive transition values for determining words from a speech signal |
US5870709A (en) * | 1995-12-04 | 1999-02-09 | Ordinate Corporation | Method and apparatus for combining information from speech signals for adaptive interaction in teaching and testing |
US6006175A (en) * | 1996-02-06 | 1999-12-21 | The Regents Of The University Of California | Methods and apparatus for non-acoustic speech characterization and recognition |
US6226615B1 (en) * | 1997-08-06 | 2001-05-01 | British Broadcasting Corporation | Spoken text display method and apparatus, for use in generating television signals |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7167544B1 (en) * | 1999-11-25 | 2007-01-23 | Siemens Aktiengesellschaft | Telecommunication system with error messages corresponding to speech recognition errors |
US20030163306A1 (en) * | 2002-02-28 | 2003-08-28 | Ntt Docomo, Inc. | Information recognition device and information recognition method |
US7480616B2 (en) * | 2002-02-28 | 2009-01-20 | Ntt Docomo, Inc. | Information recognition device and information recognition method |
US7318029B2 (en) * | 2002-10-24 | 2008-01-08 | International Business Machines Corporation | Method and apparatus for a interactive voice response system |
US20040176953A1 (en) * | 2002-10-24 | 2004-09-09 | International Business Machines Corporation | Method and apparatus for a interactive voice response system |
US8374870B2 (en) | 2005-02-04 | 2013-02-12 | Vocollect, Inc. | Methods and systems for assessing and improving the performance of a speech recognition system |
US8756059B2 (en) | 2005-02-04 | 2014-06-17 | Vocollect, Inc. | Method and system for considering information about an expected response when performing speech recognition |
US7865362B2 (en) | 2005-02-04 | 2011-01-04 | Vocollect, Inc. | Method and system for considering information about an expected response when performing speech recognition |
US20110029313A1 (en) * | 2005-02-04 | 2011-02-03 | Vocollect, Inc. | Methods and systems for adapting a model for a speech recognition system |
US20110029312A1 (en) * | 2005-02-04 | 2011-02-03 | Vocollect, Inc. | Methods and systems for adapting a model for a speech recognition system |
US7895039B2 (en) * | 2005-02-04 | 2011-02-22 | Vocollect, Inc. | Methods and systems for optimizing model adaptation for a speech recognition system |
US20110093269A1 (en) * | 2005-02-04 | 2011-04-21 | Keith Braho | Method and system for considering information about an expected response when performing speech recognition |
US7949533B2 (en) | 2005-02-04 | 2011-05-24 | Vococollect, Inc. | Methods and systems for assessing and improving the performance of a speech recognition system |
US20110161083A1 (en) * | 2005-02-04 | 2011-06-30 | Keith Braho | Methods and systems for assessing and improving the performance of a speech recognition system |
US20110161082A1 (en) * | 2005-02-04 | 2011-06-30 | Keith Braho | Methods and systems for assessing and improving the performance of a speech recognition system |
US10068566B2 (en) | 2005-02-04 | 2018-09-04 | Vocollect, Inc. | Method and system for considering information about an expected response when performing speech recognition |
US8200495B2 (en) | 2005-02-04 | 2012-06-12 | Vocollect, Inc. | Methods and systems for considering information about an expected response when performing speech recognition |
US8255219B2 (en) | 2005-02-04 | 2012-08-28 | Vocollect, Inc. | Method and apparatus for determining a corrective action for a speech recognition system based on the performance of the system |
US20060178882A1 (en) * | 2005-02-04 | 2006-08-10 | Vocollect, Inc. | Method and system for considering information about an expected response when performing speech recognition |
US9928829B2 (en) | 2005-02-04 | 2018-03-27 | Vocollect, Inc. | Methods and systems for identifying errors in a speech recognition system |
US8612235B2 (en) | 2005-02-04 | 2013-12-17 | Vocollect, Inc. | Method and system for considering information about an expected response when performing speech recognition |
US9202458B2 (en) | 2005-02-04 | 2015-12-01 | Vocollect, Inc. | Methods and systems for adapting a model for a speech recognition system |
US20070192101A1 (en) * | 2005-02-04 | 2007-08-16 | Keith Braho | Methods and systems for optimizing model adaptation for a speech recognition system |
US8868421B2 (en) | 2005-02-04 | 2014-10-21 | Vocollect, Inc. | Methods and systems for identifying errors in a speech recognition system |
US8407051B2 (en) | 2007-07-02 | 2013-03-26 | Mitsubishi Electric Corporation | Speech recognizing apparatus |
US20110208525A1 (en) * | 2007-07-02 | 2011-08-25 | Yuzuru Inoue | Voice recognizing apparatus |
DE112008001334B4 (en) * | 2007-07-02 | 2016-12-15 | Mitsubishi Electric Corp. | Voice recognition device |
US9697818B2 (en) | 2011-05-20 | 2017-07-04 | Vocollect, Inc. | Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment |
US8914290B2 (en) | 2011-05-20 | 2014-12-16 | Vocollect, Inc. | Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment |
US10685643B2 (en) | 2011-05-20 | 2020-06-16 | Vocollect, Inc. | Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment |
US11810545B2 (en) | 2011-05-20 | 2023-11-07 | Vocollect, Inc. | Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment |
US11817078B2 (en) | 2011-05-20 | 2023-11-14 | Vocollect, Inc. | Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment |
US20140142943A1 (en) * | 2012-11-22 | 2014-05-22 | Fujitsu Limited | Signal processing device, method for processing signal |
US9978395B2 (en) | 2013-03-15 | 2018-05-22 | Vocollect, Inc. | Method and system for mitigating delay in receiving audio stream during production of sound from audio stream |
US11837253B2 (en) | 2016-07-27 | 2023-12-05 | Vocollect, Inc. | Distinguishing user speech from background speech in speech-dense environments |
WO2021134549A1 (en) * | 2019-12-31 | 2021-07-08 | 李庆远 | Human merging and training of multiple artificial intelligence outputs |
CN111179939A (en) * | 2020-04-13 | 2020-05-19 | 北京海天瑞声科技股份有限公司 | Voice transcription method, voice transcription device and computer storage medium |
CN111179939B (en) * | 2020-04-13 | 2020-07-28 | 北京海天瑞声科技股份有限公司 | Voice transcription method, voice transcription device and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
DE19941227A1 (en) | 2001-03-08 |
EP1081683A1 (en) | 2001-03-07 |
EP1081683B1 (en) | 2009-06-24 |
JP2001100790A (en) | 2001-04-13 |
DE50015670D1 (en) | 2009-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6629072B1 (en) | Method of an arrangement for speech recognition with speech velocity adaptation | |
US20230012984A1 (en) | Generation of automated message responses | |
US10276164B2 (en) | Multi-speaker speech recognition correction system | |
US7113908B2 (en) | Method for recognizing speech using eigenpronunciations | |
JP4867804B2 (en) | Voice recognition apparatus and conference system | |
US5791904A (en) | Speech training aid | |
EP2051241B1 (en) | Speech dialog system with play back of speech output adapted to the user | |
US9911408B2 (en) | Dynamic speech system tuning | |
US20130080172A1 (en) | Objective evaluation of synthesized speech attributes | |
JP2000181482A (en) | Voice recognition device and noninstruction and/or on- line adapting method for automatic voice recognition device | |
JPH06332495A (en) | Equipment and method for speech recognition | |
JPH05181494A (en) | Apparatus and method for identifying audio pattern | |
JP4246703B2 (en) | Automatic speech recognition method | |
US20070136060A1 (en) | Recognizing entries in lexical lists | |
US20020123893A1 (en) | Processing speech recognition errors in an embedded speech recognition system | |
JPH0876785A (en) | Voice recognition device | |
WO2006083020A1 (en) | Audio recognition system for generating response audio by using audio data extracted | |
EP1005019A3 (en) | Segment-based similarity measurement method for speech recognition | |
JP2004333543A (en) | System and method for speech interaction | |
JPH06110494A (en) | Pronounciation learning device | |
JP3798530B2 (en) | Speech recognition apparatus and speech recognition method | |
US6308152B1 (en) | Method and apparatus of speech recognition and speech control system using the speech recognition method | |
JP2003177779A (en) | Speaker learning method for speech recognition | |
EP0508225A2 (en) | Computer system for speech recognition | |
JP3277579B2 (en) | Voice recognition method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: U.S. PHILIPS CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:THELEN, ERIC;WUTTE, HERIBERT;REEL/FRAME:011282/0658;SIGNING DATES FROM 20000921 TO 20001019 |
|
AS | Assignment |
Owner name: KONINKLIJKE PHILIPS ELECTRONICS N.V., NETHERLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:U.S. PHILIPS CORPORATION;REEL/FRAME:014365/0563 Effective date: 20030721 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: NUANCE COMMUNICATIONS AUSTRIA GMBH, AUSTRIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KONINKLIJKE PHILIPS ELECTRONICS N.V.;REEL/FRAME:022299/0350 Effective date: 20090205 Owner name: NUANCE COMMUNICATIONS AUSTRIA GMBH,AUSTRIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KONINKLIJKE PHILIPS ELECTRONICS N.V.;REEL/FRAME:022299/0350 Effective date: 20090205 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |