WO2004038695A1 - Directional speech recognition device and method - Google Patents

Directional speech recognition device and method Download PDF

Info

Publication number
WO2004038695A1
WO2004038695A1 PCT/EP2003/050685 EP0350685W WO2004038695A1 WO 2004038695 A1 WO2004038695 A1 WO 2004038695A1 EP 0350685 W EP0350685 W EP 0350685W WO 2004038695 A1 WO2004038695 A1 WO 2004038695A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech recognition
recognition device
microphones
user interface
target direction
Prior art date
Application number
PCT/EP2003/050685
Other languages
French (fr)
Inventor
Douglas Ralph Ealey
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Priority to AU2003282109A priority Critical patent/AU2003282109A1/en
Publication of WO2004038695A1 publication Critical patent/WO2004038695A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Definitions

  • the invention relates to the field of speech recognition.
  • the present invention provides a speech recognition device, as claimed in claim 1.
  • the present invention provides a method for controlling speech recognition, as claimed in claim 11.
  • FIG. 1 illustrates a possible microphone configuration in accordance with a preferred embodiment of the invention
  • FIG. 2 illustrates a target direction for the user's voice, together with an angular threshold of deviation theta from the target direction.
  • FIG. 3 illustrates an arc of possible target directions for the user's voice, ranging between 'normal to the plane of the user interface' and 'parallel to the plane of the user interface' .
  • FIG. illustrates a system for the control of speech recognition in accordance with the preferred embodiment.
  • a speech recognition device comprising an input signal 404 and an estimate of the direction 406 of the input signal.
  • the target direction selected from the target options stored in store 412 is compared with the direction 406 of the input signal by a processor 430. This comparison uses a threshold of deviation from the target direction stored in 410 to determine whether the input signal 404 should be passed as output 408 to a recogniser.
  • Figure 1 illustrates the preferred embodiment, comprising at least three microphones 160, 162, 164 that provide input to a means to determine the direction of a signal source.
  • the speech recognition process is then controlled, by: (I) Comparing the direction of the signal source with a given target direction.
  • Target directions are illustrated as direction 210 of figure 2, and directions generally shown as 314 in figure 3; and
  • FIG. 2 illustrates a target direction for the user's voice that is substantially normal to the plane of the user interface and microphones, together with an angular threshold of deviation theta from the target direction
  • the initial target direction 210 assumes that the user will wish to look at the user interface 250 of device 200 when controlling it. Consequently in this preferred embodiment the microphones should be distributed to form a plane substantially in parallel with the plane of the user interface 250.
  • the target direction 210 can then be taken to be normal to the plane of the microphones and thus implicitly the plane of user interface 250. It is anticipated that the user will rarely be aligned exactly normal to the plane of the user interface 250, and so an angular threshold 220 is introduced, wherein the speech recognition process is still permitted if the user is within angle ⁇ 230 of the target direction.
  • An alternative target direction can be selected either by the user via a user interface 450 or by automatic control 440 if the device is placed in a power and/or data cradle, or if it is left alone on a substantially horizontal surface. This target direction is described in accordance with figure 3.
  • FIG. 3 illustrates an arc of possible target directions for the user's voice, ranging between 'normal to the plane of the user interface' and 'parallel to the plane of the user interface' , in the direction downwards and vertical to the plane of the user interface, together with an angular threshold of deviation theta from the possible target directions .
  • the alternative target direction may be anywhere on an arc 314 that is centred on the vertical axis of the plane of the user interface 350.
  • Arc 314 extends between the normal to the plane of the user interface 310, and procedes down the vertical axis until parallel to the plane of the user interface 312.
  • an angular threshold 320 is introduced, wherein the speech recognition process is still permitted if the user is within angle ⁇ 330 of the target direction .
  • the angular control of the speech recognition system can be specified by the user in situations where the user needs to control the device from a relative position other than those permitted by the configurations above. In such an instance the user may manually indicate the wish to redefine the target direction via the user interface 450, and then speak from that direction to set the device.
  • the user's direction is determined by comparing the relative signal delay between pairs of microphones 160, 162, 164, and then using these delays and the positions of the corresponding microphones to calculate the signal direction.
  • the angular control of the speech recognition system can be overridden automatically in situations where the signal from any one microphone 160, 162, 164 falls below an amplitude ratio with respect to the signals from the remaining microphones, as in when the device is held in a typical phone position, so favouring reception by any microphone near the mouth.

Abstract

A speech recognition device (100), comprising one or more microphones (160, 162, 164) and means to determine the direction of a signal source received by the microphones, together with means for permitting recognition processing by the device. Permission is determined by means for comparing the direction of the signal source with a target direction (210, 314), and allowing recognition processing if the source direction is within an angular threshold (theta) of the target direction. The invention enhances the discernment by voice activated devices of speech that is not directly addressed to them.

Description

DIRECTIONAL SPEECH RECOGNITION DEVICE AND ETHOD
Technical Field
The invention relates to the field of speech recognition.
Background
There is an emergent market for voice-controlled multi- modal, multimedia and telematic devices. This market raises the problem that such devices must be able to discern whether or not an utterance was addressed to that particular device, or to some third party.
With devices that use voiced keywords or voiced dialling, one does not want the device to perform actions as a consequence of overhearing chance instances of those keywords or names whilst it rests on a desk or in a pocket. The consequences may be for the device to inadvertently call someone mentioned in an overheard conversation, or to change mode/application, making it confusing to the user to find the device interface changing apparently ^randomly' from use to use. Both behaviours would be seen as a severe disadvantage to voice control by the user.
One cannot simply rely on the volume level to differentiate between overheard casual speech and close-talking use as with a normal telephone, because multimedia and multi-modal devices and telematic control systems are generally intended to be used at arms length, either to view a display or because the device is on the dashboard.
Thus there is a need for an alternative method. US 6219645 Bl (Byers) and JP 2002091491 A (Sanyo) both describe placing multiple microphones in fixed positions in a room, enabling localisation of a user with the intent of associating commands then uttered with devices distributed within the room. However, such methods are not applicable to mobile telephony, for example.
US 5884254 A (Ucar) suggests the use of separate microphone arrangements for speech transmission and speech recognition functions respectively, but this would appear to incur unnecessary cost, complexity and weight.
The general approach in the art is to use a non-obvious keyword to precede interaction with the device, for example giving it a name to be addressed by. However, having to call your personal appliances by name is not likely to be a practical or particularly desirable solution in many circumstances .
Summary of the Invention
In a first aspect, the present invention provides a speech recognition device, as claimed in claim 1.
In a second aspect, the present invention provides a method for controlling speech recognition, as claimed in claim 11.
Further aspects are as claimed in the dependent claims .
Brief description of the drawings
FIG. 1 illustrates a possible microphone configuration in accordance with a preferred embodiment of the invention; FIG. 2 illustrates a target direction for the user's voice, together with an angular threshold of deviation theta from the target direction.
FIG. 3 illustrates an arc of possible target directions for the user's voice, ranging between 'normal to the plane of the user interface' and 'parallel to the plane of the user interface' .
FIG. illustrates a system for the control of speech recognition in accordance with the preferred embodiment.
Detailed description of preferred embodiment
In a preferred embodiment, a speech recognition device is described in accordance with figure 4, comprising an input signal 404 and an estimate of the direction 406 of the input signal. The target direction selected from the target options stored in store 412 is compared with the direction 406 of the input signal by a processor 430. This comparison uses a threshold of deviation from the target direction stored in 410 to determine whether the input signal 404 should be passed as output 408 to a recogniser.
Figure 1 illustrates the preferred embodiment, comprising at least three microphones 160, 162, 164 that provide input to a means to determine the direction of a signal source. The speech recognition process is then controlled, by: (I) Comparing the direction of the signal source with a given target direction. Target directions are illustrated as direction 210 of figure 2, and directions generally shown as 314 in figure 3; and
(n) Permitting recognition processing if the source direction is within an angular threshold 220, 320 of the target direction.
FIG. 2 illustrates a target direction for the user's voice that is substantially normal to the plane of the user interface and microphones, together with an angular threshold of deviation theta from the target direction
In figure 2, the initial target direction 210 assumes that the user will wish to look at the user interface 250 of device 200 when controlling it. Consequently in this preferred embodiment the microphones should be distributed to form a plane substantially in parallel with the plane of the user interface 250. The target direction 210 can then be taken to be normal to the plane of the microphones and thus implicitly the plane of user interface 250. It is anticipated that the user will rarely be aligned exactly normal to the plane of the user interface 250, and so an angular threshold 220 is introduced, wherein the speech recognition process is still permitted if the user is within angle θ 230 of the target direction.
An alternative target direction can be selected either by the user via a user interface 450 or by automatic control 440 if the device is placed in a power and/or data cradle, or if it is left alone on a substantially horizontal surface. This target direction is described in accordance with figure 3.
FIG. 3 illustrates an arc of possible target directions for the user's voice, ranging between 'normal to the plane of the user interface' and 'parallel to the plane of the user interface' , in the direction downwards and vertical to the plane of the user interface, together with an angular threshold of deviation theta from the possible target directions .
So the alternative target direction may be anywhere on an arc 314 that is centred on the vertical axis of the plane of the user interface 350. Arc 314 extends between the normal to the plane of the user interface 310, and procedes down the vertical axis until parallel to the plane of the user interface 312.
It is anticipated that the user need not be exactly aligned on this arc. Therefore an angular threshold 320 is introduced, wherein the speech recognition process is still permitted if the user is within angle θ 330 of the target direction .
The angular control of the speech recognition system can be specified by the user in situations where the user needs to control the device from a relative position other than those permitted by the configurations above. In such an instance the user may manually indicate the wish to redefine the target direction via the user interface 450, and then speak from that direction to set the device.
In the preferred embodiment, the user's direction is determined by comparing the relative signal delay between pairs of microphones 160, 162, 164, and then using these delays and the positions of the corresponding microphones to calculate the signal direction.
The angular control of the speech recognition system can be overridden automatically in situations where the signal from any one microphone 160, 162, 164 falls below an amplitude ratio with respect to the signals from the remaining microphones, as in when the device is held in a typical phone position, so favouring reception by any microphone near the mouth.
Finally, in the preferred embodiment it is desirable to provide a facility to reversibly enable or disable the angular control of the speech recognition system via the user interface 450, for example in a dictation scenario.

Claims

Claims
1. A speech recognition device (100), comprising: one or more microphones (160, 162, 164); means to determine the direction of a signal source received by the one or more microphones (160, 162, 164); means for permitting recognition processing by the device, characterised by: means for comparing the direction of the signal source with a target direction (210, 314) , and permitting recognition processing if the source direction is within an angular threshold (theta) of the target direction (210,314) .
2. A speech recognition device (100), according to claim 1, wherein the target direction (210) is substantially normal to the plane of the user interface.
3. A speech recognition device (100), according to claim
1 or claim 2, wherein the target direction (314) is anywhere on an arc (314) centred on the vertical axis of the plane of the user interface, the arc being from substantially normal (310) to the plane of the user interface and proceeding down the vertical axis until substantially parallel (312) to the plane of the user interface.
4. A speech recognition device (100), according to claim 1 or claim 2, wherein the device is adapted to change the target direction automatically to that of claim 3 when either the device is placed in a cradle provided for it, or the device determines that it has been stationary for a predetermined period of time.
5. A speech recognition device (100), according to any previous claim, wherein user-operable means are provided, the user-operable means being adapted to enable the operator to redefine the target direction manually.
6. A speech recognition device (100) according to any previous claim, wherein the microphones (160, 162, 164) are distributed in a plane substantially parallel to the plane of the user interface of the device.
7. A speech recognition device (100) according to any previous claim, further comprising means for comparing the relative signal delay between any given pair of microphones (160, 162, 164) .
8. A speech recognition device (100) according to any previous claim, comprising means for calculating signal direction using relative signal delays between microphone pairs and the positions of the microphones (160, 162, 164).
9. A speech recognition device (100) according to any previous claim, wherein the use of an angular threshold to permit recognition processing is overridden if the signal from any one microphone (160, 162, 164) falls below an amplitude ratio with respect to the signals from the remaining microphones.
10. A speech recognition device (100) according to any previous claim, wherein the use of an angular threshold to permit recognition processing may be reversibly enabled or disabled by the user via the user interface.
11. A method for control of speech recognition, comprising: provision of at least one microphone (160, 162, 164), the at least one microphone providing input to a means for determining the direction of a signal source; provision of means for permitting recognition processing by the device, characterised by: comparing the direction of the signal source with a target direction (210, 314), and permitting recognition processing if the source direction is within an angular threshold (theta) of the target direction.
PCT/EP2003/050685 2002-10-25 2003-10-03 Directional speech recognition device and method WO2004038695A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003282109A AU2003282109A1 (en) 2002-10-25 2003-10-03 Directional speech recognition device and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0224797.1 2002-10-25
GB0224797A GB2394589B (en) 2002-10-25 2002-10-25 Speech recognition device and method

Publications (1)

Publication Number Publication Date
WO2004038695A1 true WO2004038695A1 (en) 2004-05-06

Family

ID=9946532

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2003/050685 WO2004038695A1 (en) 2002-10-25 2003-10-03 Directional speech recognition device and method

Country Status (4)

Country Link
AU (1) AU2003282109A1 (en)
GB (1) GB2394589B (en)
HK (1) HK1063372A1 (en)
WO (1) WO2004038695A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9691413B2 (en) 2015-10-06 2017-06-27 Microsoft Technology Licensing, Llc Identifying sound from a source of interest based on multiple audio feeds

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8767975B2 (en) 2007-06-21 2014-07-01 Bose Corporation Sound discrimination method and apparatus
US8611554B2 (en) 2008-04-22 2013-12-17 Bose Corporation Hearing assistance apparatus
US9078077B2 (en) 2010-10-21 2015-07-07 Bose Corporation Estimation of synthetic audio prototypes with frequency-based input signal decomposition
EP2911149B1 (en) * 2014-02-19 2019-04-17 Nokia Technologies OY Determination of an operational directive based at least in part on a spatial audio property
CN112216275A (en) * 2019-07-10 2021-01-12 阿里巴巴集团控股有限公司 Voice information processing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219645B1 (en) * 1999-12-02 2001-04-17 Lucent Technologies, Inc. Enhanced automatic speech recognition using multiple directional microphones
JP2001359185A (en) * 2000-06-13 2001-12-26 Matsushita Electric Ind Co Ltd Hands-free device and audio signal processing method therefor
DE10058786A1 (en) * 2000-11-27 2002-06-13 Philips Corp Intellectual Pty Method for controlling a device having an acoustic output device
US20030125959A1 (en) * 2001-12-31 2003-07-03 Palmquist Robert D. Translation device with planar microphone array

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5884254A (en) * 1995-08-02 1999-03-16 Sensimetrics Corporation Method and apparatus for restricting microphone acceptance angle
JP2002091491A (en) * 2000-09-20 2002-03-27 Sanyo Electric Co Ltd Voice control system for plural pieces of equipment
JP3910898B2 (en) * 2002-09-17 2007-04-25 株式会社東芝 Directivity setting device, directivity setting method, and directivity setting program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219645B1 (en) * 1999-12-02 2001-04-17 Lucent Technologies, Inc. Enhanced automatic speech recognition using multiple directional microphones
JP2001359185A (en) * 2000-06-13 2001-12-26 Matsushita Electric Ind Co Ltd Hands-free device and audio signal processing method therefor
DE10058786A1 (en) * 2000-11-27 2002-06-13 Philips Corp Intellectual Pty Method for controlling a device having an acoustic output device
US20030138118A1 (en) * 2000-11-27 2003-07-24 Volker Stahl Method for control of a unit comprising an acoustic output device
US20030125959A1 (en) * 2001-12-31 2003-07-03 Palmquist Robert D. Translation device with planar microphone array

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BRANDSTEIN M S ET AL: "A practical time-delay estimator for localizing speech sources with a microphone array", COMPUTER SPEECH AND LANGUAGE, ACADEMIC PRESS, LONDON, GB, vol. 9, no. 2, April 1995 (1995-04-01), pages 153 - 169, XP004418823, ISSN: 0885-2308 *
PATENT ABSTRACTS OF JAPAN vol. 2002, no. 04 4 August 2002 (2002-08-04) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9691413B2 (en) 2015-10-06 2017-06-27 Microsoft Technology Licensing, Llc Identifying sound from a source of interest based on multiple audio feeds

Also Published As

Publication number Publication date
HK1063372A1 (en) 2004-12-24
GB0224797D0 (en) 2002-12-04
AU2003282109A1 (en) 2004-05-13
GB2394589B (en) 2005-05-25
GB2394589A (en) 2004-04-28

Similar Documents

Publication Publication Date Title
US7986802B2 (en) Portable electronic device and personal hands-free accessory with audio disable
US20100330908A1 (en) Telecommunications device with voice-controlled functions
US20100332236A1 (en) Voice-triggered operation of electronic devices
US6993366B2 (en) Portable telephone, control method thereof, and recording medium therefor
EP0393059B1 (en) Method for terminating a telephone call by voice command
EP2698787B1 (en) Method for providing voice call using text data and electronic device thereof
US9680883B1 (en) Techniques for integrating voice control into an active telephony call
WO2017220856A1 (en) Electronic accessory incorporating dynamic user-controlled audio muting capabilities, related methods and communications terminal
US20140314242A1 (en) Ambient Sound Enablement for Headsets
US20090022305A1 (en) Phone call mute notification
WO2023029299A1 (en) Earphone-based communication method, earphone device, and computer-readable storage medium
CA2426523A1 (en) Method of compensating for beamformer steering delay during handsfree speech recognition
CA2534774C (en) System and method of safe and automatic acoustic volume adjustment for handsfree operation
US20190235832A1 (en) Personal Communicator Systems and Methods
WO2004038695A1 (en) Directional speech recognition device and method
KR20090027817A (en) Method for output background sound and mobile communication terminal using the same
JPH1127376A (en) Voice communication equipment
JP4299768B2 (en) Voice recognition device, method, and portable information terminal device using voice recognition method
KR100384697B1 (en) Method for checking hands-free state of mobile phone
JP3384282B2 (en) Telephone equipment
US20040042590A1 (en) Method for operating a device for message storage in a communications terminal, and a communications device
US20220148588A1 (en) Call Termination Apparatus and Method Thereof
US20200098363A1 (en) Electronic device
CN117412216A (en) Earphone, control method and control device thereof
KR100561774B1 (en) Method for adjusting a volume of voice automatically

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP