US20040260549A1 - Voice recognition system and method - Google Patents

Voice recognition system and method Download PDF

Info

Publication number
US20040260549A1
US20040260549A1 US10/835,742 US83574204A US2004260549A1 US 20040260549 A1 US20040260549 A1 US 20040260549A1 US 83574204 A US83574204 A US 83574204A US 2004260549 A1 US2004260549 A1 US 2004260549A1
Authority
US
United States
Prior art keywords
voice
talk
input
simulated
microphone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/835,742
Other versions
US7552050B2 (en
Inventor
Shuichi Matsumoto
Toru Marumoto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alpine Electronics Inc
Original Assignee
Alpine Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alpine Electronics Inc filed Critical Alpine Electronics Inc
Assigned to ALPINE ELECTRONICS, INC. reassignment ALPINE ELECTRONICS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MATSUMOTO, SHUICHI, MARUMOTO, TORU
Publication of US20040260549A1 publication Critical patent/US20040260549A1/en
Application granted granted Critical
Publication of US7552050B2 publication Critical patent/US7552050B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present invention relates to voice recognition systems and methods for recognizing voice commands issued by users so as to control devices, and more particularly, to a voice recognition system having a talk-back function of feeding back the recognized voice to a user.
  • the presently preferred embodiments relate to a voice recognition system that allows a user to input his/her voice to operate a device such as a navigation system, hands-free device, or personal computer mounted in a vehicle.
  • a voice recognition system may be used in addition to or instead of a remote control, a touch panel, a keyboard, or a mouse.
  • Most of the voice recognition systems have a talk-back function of feeding back the recognized voice to the user via a speaker.
  • the user listens to the talk-back voice to check whether it has been correctly recognized. If the recognition result is wrong, the user inputs his/her voice once again, and if the recognition result is correct, the user supplies the corresponding information to the system.
  • the voice recognition system performs various controls.
  • a plurality of voice commands used in the voice recognition system are divided into a plurality of levels according to the type of operation to be performed on a device to be controlled. For example, to specify a destination in a navigation system by inputting an address, the user inputs the address aloud by dividing it into a plurality of levels, such as “prefecture ⁇ city, town (or village) ⁇ the rest of the address”.
  • Japanese Unexamined Patent Application Publication No. 6-149287 discloses a system in which the voice recognition time is reduced by decreasing the computation amount of a talk-back voice.
  • FIG. 4A illustrates a timing chart of the voice input enable state in a known voice recognition system.
  • the above-described first approach is adopted to input the voice.
  • the system when the user first presses the speech button, the system enters the voice recognition state to receive the voice input for a predetermined time period. During this period, the user inputs desired voice commands. After the user inputs the voice, the voice recognition system recognizes the input voice and outputs a talk-back voice. During this period, voice input is not accepted. After the talk-back operation, the system once again enters the voice input enable state to enable the user to input his/her voice.
  • the user can press the speech button to interrupt the talk-back operation and continue to input his/her voice.
  • the user has to press the speech button every time he/she inputs voice for each level, thereby making the operation very complicated.
  • a voice recognition system having a talk-back function of recognizing a voice input into a microphone and outputting a talk-back voice from a speaker.
  • the voice recognition system includes: an adaptive filter unit for generating a simulated talk-back voice inputted into the microphone by setting a filter coefficient simulating a transfer system in which the talk-back voice outputted from the speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the speaker; and an input-voice extracting unit for extracting the input voice by subtracting the simulated talk-back voice from sound inputted into the microphone.
  • a voice recognition system having a talk-back function of recognizing a voice input into a microphone and outputting a talk-back voice from a first speaker.
  • the voice recognition system includes: a first adaptive filter unit for generating a simulated talk-back voice inputted into the microphone by setting a first filter coefficient simulating a transfer system in which the talk-back voice outputted from the first speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the first speaker; a second adaptive filter unit for generating a simulated audio sound inputted into the microphone by setting a second filter coefficient simulating a transfer system in which an audio sound outputted from a second speaker is inputted into the microphone and by filtering the audio sound before being outputted from the second speaker; and an input-voice extracting unit for extracting the input voice by subtracting the simulated talk-back voice and the simulated audio sound from sound inputted into the microphone.
  • a voice recognition system having a talk-back function of recognizing a voice inputted into a microphone and outputting a talk-back voice from a speaker.
  • the voice recognition system includes: a first adaptive filter unit for generating a simulated mixed voice inputted into the microphone by setting a first filter coefficient simulating a transfer system in which a mixed voice including the talk-back voice outputted from the speaker and an audio sound are inputted into the microphone and by filtering the mixed voice before being outputted from the speaker; and an input-voice extracting unit for extracting the input voice by subtracting the simulated mixed voice from sound inputted into the microphone.
  • a voice recognition method including the acts of: setting a voice input state to a disable state in which voice input is not accepted when recognizing a voice input into a microphone by a recognition processor; setting a voice input state to an enable state in which voice input is accepted when starting outputting from a speaker a talk-back voice obtained as a result of recognizing the voice by the recognition processor; generating a simulated talk-back voice to be inputted into the microphone by setting in an adaptive filter unit a filter coefficient simulating a transfer system in which the talk-back voice outputted from the speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the speaker; and extracting the input voice and supplying the extracted voice to the recognition processor by subtracting the simulated talk-back voice from sound inputted into the microphone when the voice input state is set in the enable state.
  • a voice recognition method including the acts of: setting a voice input state to a disable state in which voice input is not accepted when recognizing a voice input into a microphone by a recognition processor; setting a voice input state to an enable state in which voice input is accepted when starting outputting from a first speaker a talk-back voice as a result of recognizing the voice by the recognition processor; generating a simulated talk-back voice to be inputted into the microphone by setting in a first adaptive filter unit a first filter coefficient simulating a transfer system in which the talk-back voice outputted from the first speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the first speaker; generating a simulated audio sound to be inputted into the microphone by setting in a second adaptive filter unit a second filter coefficient simulating a transfer system in which an audio sound outputted from a second speaker is inputted into the microphone and by filtering the audio sound before being outputted from the second speaker; and
  • a voice recognition method including the acts of: setting a voice input state to a disable state in which voice input is not accepted when recognizing a voice input into a microphone by a recognition processor; setting a voice input state to an enable state in which voice input is accepted when starting outputting from a speaker a talk-back voice obtained as a result of recognizing the voice by the recognition processor; generating a simulated mixed voice to be inputted into the microphone by setting in an adaptive filter unit a filter coefficient simulating a transfer system in which a mixed voice including the talk-back voice and an audio sound output from the speaker is inputted into the microphone and by filtering the mixed voice before being outputted from the speaker; and extracting the input voice and supplying the extracted voice to the recognition processor by subtracting the simulated mixed voice from sound inputted into the microphone when the voice input state is set in the enable state.
  • a talk-back voice outputted from a speaker and inputted into a microphone is estimated by the adaptive filter unit.
  • the estimated value of the talk-back voice is then subtracted from sound inputted into the microphone.
  • only the input voice can be extracted from the sound including the input voice and other sound.
  • FIG. 1 is a block diagram illustrating elements of a voice recognition system according to a first embodiment of the present invention
  • FIG. 2 illustrates the configuration of an adaptive filter used in the voice recognition system shown in FIG. 1;
  • FIG. 3 is a flowchart illustrating the voice recognition processing performed by the voice recognition system shown in FIG. 1;
  • FIG. 4A is a timing chart illustrating the voice input enable state of a known voice recognition system
  • FIG. 4B is a timing chart illustrating the voice input enable state of the voice recognition system of the first embodiment
  • FIG. 5 is a block diagram illustrating elements of a voice recognition system according to a second embodiment of the present invention.
  • FIG. 6 is a flowchart illustrating the voice recognition processing performed by the voice recognition system shown in FIG. 5;
  • FIG. 7 is a block diagram illustrating the essential elements of a voice recognition system according to a third embodiment of the present invention.
  • a voice recognition system 100 constructed in accordance with a first embodiment of the present invention is described below with reference to FIGS. 1 through 4.
  • the voice recognition system 100 includes a volume device or an equalizer (hereinafter referred to as the “volume device”) 1 , a gain controller 2 , an output amplifier 3 , an adaptive filter (ADF) 4 , a subtractor 5 , a voice output unit 51 , a speaker 52 , a microphone 53 , and a voice recognition engine 54 .
  • volume device or an equalizer
  • ADF adaptive filter
  • the voice output unit 51 generates a talk-back voice and outputs it.
  • the gain of the talk-back voice is then controlled in the volume device 1 and the resulting talk-back voice is amplified in the output amplifier 3 , and then, it is outputted from the speaker 52 .
  • the microphone 53 is used for inputting the user's voice.
  • a talk-back voice and surrounding noise for example, engine noise (noise occurring when a vehicle is running)
  • output from the speaker 52 are also inputted into the microphone 53 .
  • the voice recognition engine 54 recognizes the input voice from the microphone 53 and executes a command corresponding to the input voice for a device to be controlled (not shown), for example, a navigation system.
  • the adaptive filter 4 includes, as shown in FIG. 2, a coefficient identification unit 21 and a voice correction filter 22 .
  • the coefficient identification unit 21 is a filter for identifying the transfer function (the filter coefficient of the voice correction filter 22 ) of the acoustic system from the speaker 52 to the microphone 53 , and more specifically, the coefficient identification unit 21 is an adaptive filter using the least mean square (LMS) algorithm or normalized-LMS (N-LMS) algorithm.
  • LMS least mean square
  • N-LMS normalized-LMS
  • the voice correction filter 22 performs convolutional computation by using the filter coefficient w(n) determined by the coefficient identification unit 21 and a talk-back voice x(n) to be controlled so as to supply the same transfer characteristic as that of the acoustic system to the talk-back voice x(n). As a result, a simulated talk-back voice y(n) simulating the talk-back voice inputted into the microphone 53 is generated.
  • the adaptive filter 4 may also be referred to as an adaptive filter unit.
  • the subtractor 5 subtracts the simulated talk-back voice y(n) generated by the adaptive filter 4 from the voice (mixed voice including the voice command, talk-back voice, and surrounding noise) inputted into the microphone 53 so as to extract the voice command (input voice) and surrounding noise (for example, engine noise).
  • the subtractor 5 may be referred to as an input-voice extracting unit.
  • the mixed voice including the input voice and surrounding noise extracted by the subtractor 5 is supplied to the voice recognition engine 54 .
  • the voice recognition engine 54 recognizes the voice command.
  • the mixed voice extracted by the subtractor 5 is also fed back to the coefficient identification unit 21 of the adaptive filter 4 and the gain controller 2 as the error e(n).
  • the gain controller 2 calculates the optimal gain to be added to the talk-back voice output from the voice output unit 51 , and outputs the calculated gain to the volume device 1 .
  • the mixed voice e(n) is considered as noise for the talk-back voice, and the gain of the talk-back voice to be outputted from the speaker 52 is adjusted so that the talk-back voice can be articulated to the user.
  • the volume device 1 performs gain correction for the talk-back voice outputted from the voice output unit 51 . More specifically, the volume device 1 corrects for the talk-back voice outputted from the voice output unit 51 by supplying the gain calculated by the gain controller 2 to the talk-back voice. This correction is conducted for, for example, each of a plurality of divided frequency bands.
  • the operation of the voice recognition system 100 configured as described above is briefly described.
  • the gain of the talk-back voice outputted from the voice output unit 51 is adjusted by the volume device 1 and the gain controller 2 so as to improve the articulation of the talk-back voice.
  • the talk-back voice outputted from the volume device 1 is amplified in the output amplifier 3 at a predetermined magnifying power, and is then outputted from the speaker 52 .
  • the talk-back voice outputted from the speaker 52 is inputted into the microphone 53 .
  • the voice command is also inputted into the microphone 53
  • surrounding noise for example, engine sound or load noise
  • the talk-back voice, input voice, and surrounding noise are inputted into the microphone 53 in a mixed manner.
  • This mixed voice is inputted into the positive terminal of the subtractor 5 .
  • the simulated talk-back voice (estimated value of the talk-back voice) generated by the adaptive filter 4 is inputted into the negative terminal of the subtractor 5 .
  • the subtractor 5 subtracts the simulated talk-back voice outputted from the adaptive filter 4 from the mixed voice output from the microphone 53 to calculate the error and extract the input voice and surrounding noise.
  • the extracted input voice and surrounding noise are supplied to the voice recognition engine 54 .
  • the voice recognition engine 54 then performs noise reduction and voice-command recognition.
  • the extracted input voice and surrounding noise are fed back to the gain controller 2 and the adaptive filter 4 so that they can be used for improving the articulation of the talk-back voice and for calculating the estimated value of the talk-back voice.
  • FIG. 3 is a flowchart illustrating the voice recognition processing performed by the voice recognition system 100 shown in FIG. 1 of the first embodiment. Although it is not shown in FIG. 1, the voice recognition system 100 is provided with a controller for controlling the entire voice recognition operation. The flowchart of FIG. 3 is executed under the control of this controller.
  • act S 1 the controller detects a voice recognition start trigger (for example, an operation for pressing the speech button or voice input of a predetermined keyword). Then, in act S 2 , the controller turns on the voice recognition engine 54 so that the system enters the voice input enable state. In this state, in act S 3 , the user inputs a first command, which is the topmost level of voice commands consisting of a plurality of levels.
  • a voice recognition start trigger for example, an operation for pressing the speech button or voice input of a predetermined keyword.
  • the issued voice command is inputted into the microphone 53 , and is supplied to the voice recognition engine 54 via the subtractor 5 .
  • the voice recognition engine 54 performs voice recognition processing (including noise reduction). In this case, the controller turns off the voice recognition engine 54 to cancel the voice input enable state.
  • the volume device 1 and the gain controller 2 start improving the articulation of the talk-back voice.
  • the voice output unit 51 starts outputting a recognition result obtained from the voice recognition engine 54 and a talk-back voice.
  • act S 7 the controller determines during this talk-back operation whether it is necessary to continue to input voice by, for example, shifting to a lower level of commands. If the outcome of act S 7 is yes, the process proceeds to act S 8 in which the controller turns on the voice recognition engine 54 again so that the system enters the voice input enable state. Subsequently, in act S 9 , the subtractor 5 obtains the estimated value of the talk-back voice output in act S 6 from the adaptive filter 4 , and subtracts the estimated value from the input voice from the microphone 53 so as to remove the talk-back voice from the input voice.
  • the controller determines in act S 10 whether a voice command has been issued. If a voice command is not issued, the process returns to act S 9 , and the loop operation is repeated until a voice command is issued. If a voice command is not issued for a predetermined period, the time-out processing is performed. If a voice command is issued in act S 10 , the process proceeds to act S 11 in which the controller interrupts the talk-back operation, and returns to act S 4 . In this process, the talk-back operation is interrupted when a voice command is issued. It is not essential, however, that the talk-back operation be interrupted since the talk-back voice has been removed from the input voice.
  • FIG. 4B is a timing chart illustrating a change in the voice input enable state of the voice recognition system 100 of this embodiment in comparison with that of a known voice recognition system shown in FIG. 4A. The operation of the known voice recognition system has been discussed above.
  • the system when the user first presses the speech button, the system enters the voice recognition mode to enable the user to input voice for a predetermined time period. During this period, the user issues desired voice commands. When a voice command is issued, the input voice is recognized, and a talk-back voice is outputted. The operation up to this stage is the same as that of the known voice recognition system shown in FIG. 4A.
  • the user is not allowed to input voice during the talk-back operation.
  • the system automatically enters the voice input enable state. This enables the user to continue to input his/her voice at any time without having to wait until the talk-back operation is finished. As a result, the waiting time can be reduced.
  • voice input can be accepted when necessary even during a talk-back operation, and a user is able to input his/her voice at any time without having to wait until the talk-back operation is finished.
  • the voice recognition operation time can be reduced.
  • the user does not have to press the speech button every time he/she issues a voice command, thereby eliminating a complicated operation.
  • the talk-back voice is removed from the input voice from the microphone 53 .
  • the articulation of the talk-back voice can be enhanced, and also, the voice recognition operation time can be reduced without increasing the cost.
  • FIG. 5 is a block diagram illustrating elements of the voice recognition system 200 .
  • elements having the same functions as those of the elements in FIG. 1 are indicated by like reference numerals, and an explanation thereof is thus omitted.
  • the voice recognition system 200 includes, as shown in FIG. 5, output amplifiers 6 - 1 and 6 - 2 , second adaptive filters 7 - 1 and 7 - 2 , an adder 8 , a subtractor 9 , an audio playback unit 61 , and speakers 62 - 1 and 62 - 2 having a plurality of channels (right channel and left channel).
  • the audio playback unit 61 plays back various audio sources, for example, compact discs (CDs), Mini Discs (MDs), digital versatile disks (DVDs), and radio broadcasts.
  • the output amplifiers 6 - 1 and 6 - 2 amplify audio sounds of the right and left channels played back by the audio playback unit 61 at a predetermined amplifying factor, and supply the amplified audio sounds to the speakers 62 - 1 and 62 - 2 , respectively.
  • the audio sounds are then outputted from the speakers 62 - 1 and 62 - 2 to the microphone 53 together with an input voice and a talk-back voice outputted from the speaker 52 .
  • the second adaptive filters 71 - 1 and 71 - 2 are configured, as shown in FIG. 2, similarly to the adaptive filter 4 .
  • the second adaptive filter 71 - 1 identifies the filter coefficient simulating the transfer system from the right-channel speaker 62 - 1 to the microphone 53 , and performs filter processing on the right-channel audio sound to generate a simulated right-channel audio sound.
  • the other second adaptive filter 7 - 2 identifies the filter coefficient simulating the transfer system from the left-channel speaker 62 - 2 to the microphone 53 , and performs filter processing on the left-channel audio sound to generate a simulated left-channel audio sound.
  • the adaptive filter 4 forms the first adaptive filter unit
  • the second adaptive filters 7 - 1 and 7 - 2 form the second adaptive filter unit.
  • the adder 8 adds the simulated right-channel and left-channel audio sounds outputted from the second adaptive filters 7 - 1 and 7 - 2 , and outputs the added simulated sound to the subtractor 9 .
  • the subtractor 5 subtracts a simulated talk-back voice generated by the adaptive filter 4 from a voice (mixed voice including a voice command, talk-back voice, audio sound, and surrounding noise) inputted into the microphone 53 so as to extract the voice command, audio sound, and surrounding noise.
  • the subtractor 9 further subtracts the simulated audio sound generated by the second adaptive filters 7 - 1 and 7 - 2 and the adder 8 from the voice outputted from the subtractor 5 so as to extract the voice command (input voice) and surrounding noise.
  • the subtractors 5 and 9 may be referred to as the input-voice extracting unit.
  • the voice recognition engine 54 reduce the surrounding noise contained in the mixed voice extracted by the subtractor 5 to recognize only the voice command.
  • the mixed voice extracted by the subtractor 5 is also fed back to the gain controller 2 and the adaptive filter 4 .
  • the mixed voice extracted by the subtractor 9 is supplied to the voice recognition engine 54 and is also fed back to the second adaptive filters 7 - 1 and 7 - 2 .
  • the operation of the voice recognition system 200 configured as described above is briefly discussed below.
  • the gain of the talk-back voice outputted from the voice output unit 51 is adjusted by the volume device 1 and the gain controller 2 so as to improve the articulation of the talk-back voice.
  • the talk-back voice output from the volume device 1 is then amplified in the output amplifier 3 at a predetermined amplifying factor and is then outputted from the speaker 52 .
  • the audio sounds output from the audio playback unit 61 are amplified in the output amplifiers 6 - 1 and 6 - 2 , and are then outputted from the speakers 62 - 1 and 62 - 2 , respectively.
  • the talk-back voice output from the speaker 52 and the audio sounds output from the speakers 62 - 1 and 62 - 2 are inputted into the microphone 53 .
  • the user issues a voice command, it is also inputted into the microphone 53 , and if the vehicle is running, surrounding noise, for example, engine sound or load noise, is also inputted into the microphone 53 .
  • surrounding noise for example, engine sound or load noise
  • This mixed voice is inputted into the positive terminal of the subtractor 5 .
  • the simulated talk-back voice generated by the adaptive filter 4 is inputted into the negative terminal of the subtractor 5 .
  • the subtractor 5 subtracts the simulated talk-back voice output from the adaptive filter 4 from the mixed voice output from the microphone 53 so as to calculate the error and extract the audio sounds, input voice, and surrounding noise.
  • the mixed voice including the audio sounds, input voice, and surrounding noise extracted by the subtractor 5 is inputted into the positive terminal of the subtractor 9 .
  • the simulated audio sound generated by the second adaptive filters 7 - 1 and 7 - 2 and the adder 8 is inputted into the negative terminal of the subtractor 9 .
  • the subtractor 9 subtracts the simulated audio sound output from the adder 8 from the mixed voice output from the subtractor 5 to calculate the error and extract the input voice and surrounding noise.
  • the extracted input voice and surrounding noise are supplied to the voice recognition engine 54 .
  • the voice recognition engine 54 reduces the surrounding noise and recongizes the voice command.
  • the audio sounds, input voice, and surrounding noise extracted by the subtractor 5 are also fed back to the gain controller 2 and the adaptive filter 4 and are used for enhancing the articulation of the talk-back voice and for estimating the talk-back voice.
  • the input voice and surrounding noise extracted by the subtractor 9 are also fed back to the second adaptive filters 7 - 1 and 7 - 2 and are used for estimating the audio sound.
  • FIG. 6 is a flowchart illustrating the voice recognition processing performed by the voice recognition system 200 of the second embodiment.
  • the same process steps as those of FIG. 3 are indicated by like step numbers, and an explanation thereof is thus omitted.
  • the process different from that of FIG. 3 is in that acts S 21 and S 22 (removing audio sound) are inserted between acts S 2 and S 3 and between acts S 9 and S 10 , respectively.
  • the talk-back voice and audio sound are removed from the input voice outputted from the microphone so as to extract the input voice and surrounding noise.
  • the extracted input voice and surrounding noise are then supplied to the voice recognition engine 54 . Accordingly, even while the talk-back operation and audio playback operation are being performed, the user can input voice at any time, thereby reducing the voice recognition operation time.
  • a voice recognition system 300 constructed in accordance with a third embodiment of the present invention is now described with reference to FIG. 7. Elements having the same functions as those of the elements shown in FIG. 5 are designated with like reference numerals, and an explanation thereof is thus omitted.
  • the talk-back voice and the audio sound are outputted to the different elements.
  • a talk-back voice and audio sound are outputted to the same element.
  • the output amplifier 3 shown in FIG. 5 is eliminated, and only two output amplifiers 6 - 1 and 6 - 2 are provided. Also, instead of the adaptive filter 4 shown in FIG. 5, a variable filter 10 is provided, and an adder 11 is further provided. The other elements are similar to those of FIG. 5.
  • the adder 11 adds a talk-back voice outputted from the volume device 1 and a right-channel audio sound played back by the audio playback unit 61 , and outputs the mixed voice to the output amplifier 6 - 1 and the adaptive filter 7 - 1 .
  • the output amplifier 6 - 1 amplifies the voice outputted from the adder 11 at a predetermined amplifying factor, and outputs the amplified voice from the right-channel speaker 62 - 1 .
  • the adaptive filter 7 - 1 identifies the filter coefficient simulating the transfer system from the right-channel speaker 62 - 1 to the microphone 53 . By using this identified filter coefficient, the adaptive filter 7 - 1 filters the mixed voice including the talk-back voice and the right-channel audio sound outputted from the adder 11 to generate a simulated mixed voice.
  • variable filter 10 which is a voice correction filter, copies the filter coefficient identified by the adaptive filter 7 - 1 to the variable filter 10 , the filter coefficient set in the variable filter 10 being variable.
  • the variable filter 10 filters the talk-back voice outputted from the volume device 1 to generate a simulated talk-back voice to be inputted into the microphone 53 .
  • the right-channel adaptive filter 7 - 1 which is the copy source of the filter coefficient to be inputted into the variable filter 10 , simulates the transfer system from the right-channel speaker 62 - 1 , which outputs the talk-back voice, to the microphone 53 . If, for example, the voice recognition system 300 of this embodiment is used in a navigation system, the talk-back voice is outputted from the right-channel speaker 62 - 1 installed near the driver's seat, and the microphone 53 receiving this talk-back voice is also installed near the driver's seat. In this case, it is thus preferable that the filter coefficient of the right-channel adaptive filter 7 - 1 be copied into the variable filter 10 . If the driver's seat is at the left side of the vehicle, it is preferable that the filter coefficient of the left-channel adaptive filter 7 - 2 be copied into the variable filter 10 .
  • the operation performed by the voice recognition system 300 of the third embodiment is briefly discussed below.
  • the gain of the talk-back voice outputted from the voice output unit 51 is adjusted by the volume device 1 and the gain controller 2 so as to improve the articulation of the talk-back voice.
  • the talk-back voice output from the volume device 1 is added to the right-channel audio sound played back by the audio playback unit 61 in the adder 11 , and the mixed voice is then amplified in the output amplifier 6 - 1 at a predetermined amplifying factor and is then outputted from the speaker 62 - 1 .
  • the left-channel audio sound played back by the audio playback unit 61 is amplified in the outputted amplifier 6 - 2 at a predetermined amplifying factor and is then outputted from the speaker 62 - 2 .
  • the voice (including the talk-back voice and right-channel audio sound) output from the speaker 62 - 1 and the left-channel audio sound output from the speaker 62 - 2 are inputted into the microphone 53 .
  • the user issues a voice command, it is also inputted into the microphone 53 , and if the vehicle is running, surrounding noise, for example, engine sound or load noise, is also inputted into the microphone 53 .
  • surrounding noise for example, engine sound or load noise
  • This mixed voice is inputted into the positive terminals of the subtractors 5 and 9 . Meanwhile, the simulated talk-back voice generated by the variable filter 10 is inputted into the negative terminal of the subtractor 5 .
  • the subtractor 5 subtracts the simulated talk-back voice output from the variable filter 10 from the mixed voice output from the microphone 53 so as to calculate the error and extract the audio sound, input voice, and surrounding noise.
  • the extracted mixed voice is fed back to the gain controller 2 and is used for enhancing the articulation of the talk-back voice.
  • the mixed voice including the talk-back voice and right-channel audio sound output from the adder 11 is also inputted into the adaptive filter 7 - 1 .
  • the adaptive filter 7 - 1 then generates a simulated voice including the talk-back voice and right-channel audio sound. Meanwhile, a simulated left-channel audio sound is generated in the adaptive filter 7 - 2 .
  • the simulated right-channel and left-channel audio sounds generated by the adaptive filters 7 - 1 and 7 - 2 , respectively, are added in the adder 8 , and the added simulated audio sound is inputted into the negative terminal of the subtractor 9 .
  • the subtractor 9 subtracts the simulated voice including the talk-back voice and audio sound output from the adder 8 from the mixed voice output from the subtractor 5 so as to calculate the error and extract the input voice and surrounding noise.
  • the input voice and surrounding noise extracted by the subtractor 9 are supplied to the voice recognition engine 54 .
  • the voice recognition engine 54 reduces noise to recognize the voice command (input voice).
  • the input voice and surrounding noise extracted by the subtractor 9 are also fed back to the adaptive filters 7 - 1 and 7 - 2 and are used for estimating the audio sound.
  • the voice recognition processing performed by the voice recognition system 300 of the third embodiment is similar to that shown in FIG. 6, and an explanation thereof is thus omitted.
  • variable filter 10 does not have to perform computation for identifying filter coefficients because it copies the filter coefficient from the adaptive filter 7 - 1 , thereby reducing the processing load.

Abstract

A voice recognition system includes an adaptive filter and a subtractor. The adaptive filter generates a simulated talk-back voice y(n) by setting a filter coefficient simulating a transfer system in which an input voice corresponding to a voice command and a talk-back voice output from a speaker are input into a microphone and by filtering a talk-back voice x(n). The subtractor extracts the input voice by subtracting the simulated talk-back voice y(n) from mixed sound input into the microphone. With this configuration, the talk-back voice is attenuated from the mixed sound including the input voice and the talk-back voice input tedinto the microphone, and then, the mixed sound is supplied to a voice recognition engine. Accordingly, the user can input his/her voice during a talk-back operation without the need to interrupt it by pressing a speech button every time the user wishes to input the voice. The voice recognition operation time can be thus reduced.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0001]
  • The present invention relates to voice recognition systems and methods for recognizing voice commands issued by users so as to control devices, and more particularly, to a voice recognition system having a talk-back function of feeding back the recognized voice to a user. [0002]
  • 2. Description of the Related Art [0003]
  • The presently preferred embodiments relate to a voice recognition system that allows a user to input his/her voice to operate a device such as a navigation system, hands-free device, or personal computer mounted in a vehicle. Such a voice recognition system may be used in addition to or instead of a remote control, a touch panel, a keyboard, or a mouse. [0004]
  • In this type of voice recognition system, when a user presses a speech button provided for the system, the system enters a voice recognition mode, the user's input voice is recognized, and a voice command is executed. There are two approaches to inputting voice. In a first approach, when a user presses the speech button once, the system enters the voice recognition mode, and the system instructs the user to input his/her voice when necessary so that the user and the system interactively communicate with each other. In a second approach, every time the user presses the speech button, the user can input his/her voice only for a predetermined time period. [0005]
  • Most of the voice recognition systems have a talk-back function of feeding back the recognized voice to the user via a speaker. The user listens to the talk-back voice to check whether it has been correctly recognized. If the recognition result is wrong, the user inputs his/her voice once again, and if the recognition result is correct, the user supplies the corresponding information to the system. In response to the user's instruction, the voice recognition system performs various controls. [0006]
  • Normally, a plurality of voice commands used in the voice recognition system are divided into a plurality of levels according to the type of operation to be performed on a device to be controlled. For example, to specify a destination in a navigation system by inputting an address, the user inputs the address aloud by dividing it into a plurality of levels, such as “prefecture→city, town (or village)→the rest of the address”. [0007]
  • In this case, every time the user inputs his/her voice, the input voice for each level is spoken back, and thus, it takes time to finish the voice input of the complete address of the destination. To overcome this drawback, attempts have been made to reduce the voice recognition time. As one example, Japanese Unexamined Patent Application Publication No. 6-149287 discloses a system in which the voice recognition time is reduced by decreasing the computation amount of a talk-back voice. [0008]
  • In known voice recognition systems, however, while a talk-back voice is outputted, the next voice input is not accepted. If the talk-back voice is mixed with a voice input by a user, incorrect recognition of the input voice is likely to occur. FIG. 4A illustrates a timing chart of the voice input enable state in a known voice recognition system. In FIG. 4A, the above-described first approach is adopted to input the voice. [0009]
  • As shown in FIG. 4A, in the first approach, when the user first presses the speech button, the system enters the voice recognition state to receive the voice input for a predetermined time period. During this period, the user inputs desired voice commands. After the user inputs the voice, the voice recognition system recognizes the input voice and outputs a talk-back voice. During this period, voice input is not accepted. After the talk-back operation, the system once again enters the voice input enable state to enable the user to input his/her voice. [0010]
  • Accordingly, in this first approach, the user cannot input his/her voice while the talk-back operation is being performed. In other words, the user has to wait until the talk-back operation is finished, and thus, it takes time to finish voice input. [0011]
  • In the second approach, the user can press the speech button to interrupt the talk-back operation and continue to input his/her voice. In this case, however, when inputting the voice for a plurality of levels, the user has to press the speech button every time he/she inputs voice for each level, thereby making the operation very complicated. [0012]
  • SUMMARY OF THE INVENTION
  • Accordingly, in view of the above-described problems, it is an object of the present invention to reduce the voice recognition operation time without the need to perform a complicated operation, such as, pressing a speech button many times. [0013]
  • In order to achieve the above object, according to one embodiment of the present invention, there is provided a voice recognition system having a talk-back function of recognizing a voice input into a microphone and outputting a talk-back voice from a speaker. The voice recognition system includes: an adaptive filter unit for generating a simulated talk-back voice inputted into the microphone by setting a filter coefficient simulating a transfer system in which the talk-back voice outputted from the speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the speaker; and an input-voice extracting unit for extracting the input voice by subtracting the simulated talk-back voice from sound inputted into the microphone. [0014]
  • According to another embodiment of the present invention, there is provided a voice recognition system having a talk-back function of recognizing a voice input into a microphone and outputting a talk-back voice from a first speaker. The voice recognition system includes: a first adaptive filter unit for generating a simulated talk-back voice inputted into the microphone by setting a first filter coefficient simulating a transfer system in which the talk-back voice outputted from the first speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the first speaker; a second adaptive filter unit for generating a simulated audio sound inputted into the microphone by setting a second filter coefficient simulating a transfer system in which an audio sound outputted from a second speaker is inputted into the microphone and by filtering the audio sound before being outputted from the second speaker; and an input-voice extracting unit for extracting the input voice by subtracting the simulated talk-back voice and the simulated audio sound from sound inputted into the microphone. [0015]
  • According to still another embodiment of the present invention, there is provided a voice recognition system having a talk-back function of recognizing a voice inputted into a microphone and outputting a talk-back voice from a speaker. The voice recognition system includes: a first adaptive filter unit for generating a simulated mixed voice inputted into the microphone by setting a first filter coefficient simulating a transfer system in which a mixed voice including the talk-back voice outputted from the speaker and an audio sound are inputted into the microphone and by filtering the mixed voice before being outputted from the speaker; and an input-voice extracting unit for extracting the input voice by subtracting the simulated mixed voice from sound inputted into the microphone. [0016]
  • According to a further embodiment of the present invention, there is provided a voice recognition method including the acts of: setting a voice input state to a disable state in which voice input is not accepted when recognizing a voice input into a microphone by a recognition processor; setting a voice input state to an enable state in which voice input is accepted when starting outputting from a speaker a talk-back voice obtained as a result of recognizing the voice by the recognition processor; generating a simulated talk-back voice to be inputted into the microphone by setting in an adaptive filter unit a filter coefficient simulating a transfer system in which the talk-back voice outputted from the speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the speaker; and extracting the input voice and supplying the extracted voice to the recognition processor by subtracting the simulated talk-back voice from sound inputted into the microphone when the voice input state is set in the enable state. [0017]
  • According to a yet further embodiment of the present invention, there is provided a voice recognition method including the acts of: setting a voice input state to a disable state in which voice input is not accepted when recognizing a voice input into a microphone by a recognition processor; setting a voice input state to an enable state in which voice input is accepted when starting outputting from a first speaker a talk-back voice as a result of recognizing the voice by the recognition processor; generating a simulated talk-back voice to be inputted into the microphone by setting in a first adaptive filter unit a first filter coefficient simulating a transfer system in which the talk-back voice outputted from the first speaker is inputted into the microphone and by filtering the talk-back voice before being outputted from the first speaker; generating a simulated audio sound to be inputted into the microphone by setting in a second adaptive filter unit a second filter coefficient simulating a transfer system in which an audio sound outputted from a second speaker is inputted into the microphone and by filtering the audio sound before being outputted from the second speaker; and extracting the input voice and supplying the extracted voice to the recognition processor by subtracting the simulated talk-back voice and the simulated audio sound from sound inputted into the microphone when the voice input state is set in the enable state. [0018]
  • According to a further embodiment of the present invention, there is provided a voice recognition method including the acts of: setting a voice input state to a disable state in which voice input is not accepted when recognizing a voice input into a microphone by a recognition processor; setting a voice input state to an enable state in which voice input is accepted when starting outputting from a speaker a talk-back voice obtained as a result of recognizing the voice by the recognition processor; generating a simulated mixed voice to be inputted into the microphone by setting in an adaptive filter unit a filter coefficient simulating a transfer system in which a mixed voice including the talk-back voice and an audio sound output from the speaker is inputted into the microphone and by filtering the mixed voice before being outputted from the speaker; and extracting the input voice and supplying the extracted voice to the recognition processor by subtracting the simulated mixed voice from sound inputted into the microphone when the voice input state is set in the enable state. [0019]
  • According to the presently preferred embodiments of the present invention, a talk-back voice outputted from a speaker and inputted into a microphone is estimated by the adaptive filter unit. The estimated value of the talk-back voice is then subtracted from sound inputted into the microphone. Thus, only the input voice can be extracted from the sound including the input voice and other sound. With this configuration, the user can input his/her voice at any time even during the talk-back operation without performing a complicated operation, for example, interrupting the talk-back operation by inputting a speech button every time the user wishes to input voice. As a result, the voice recognition operation time can be reduced.[0020]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating elements of a voice recognition system according to a first embodiment of the present invention; [0021]
  • FIG. 2 illustrates the configuration of an adaptive filter used in the voice recognition system shown in FIG. 1; [0022]
  • FIG. 3 is a flowchart illustrating the voice recognition processing performed by the voice recognition system shown in FIG. 1; [0023]
  • FIG. 4A is a timing chart illustrating the voice input enable state of a known voice recognition system; [0024]
  • FIG. 4B is a timing chart illustrating the voice input enable state of the voice recognition system of the first embodiment; [0025]
  • FIG. 5 is a block diagram illustrating elements of a voice recognition system according to a second embodiment of the present invention; [0026]
  • FIG. 6 is a flowchart illustrating the voice recognition processing performed by the voice recognition system shown in FIG. 5; and [0027]
  • FIG. 7 is a block diagram illustrating the essential elements of a voice recognition system according to a third embodiment of the present invention. [0028]
  • DETAILED DESCRIPTION OF THE DRAWINGS AND THE PRESENTLY PREFERRED EMBODIMENTS
  • The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings through illustration of preferred embodiments. [0029]
  • First Embodiment [0030]
  • A [0031] voice recognition system 100 constructed in accordance with a first embodiment of the present invention is described below with reference to FIGS. 1 through 4.
  • In FIG. 1, the [0032] voice recognition system 100 includes a volume device or an equalizer (hereinafter referred to as the “volume device”) 1, a gain controller 2, an output amplifier 3, an adaptive filter (ADF) 4, a subtractor 5, a voice output unit 51, a speaker 52, a microphone 53, and a voice recognition engine 54.
  • The [0033] voice output unit 51 generates a talk-back voice and outputs it. The gain of the talk-back voice is then controlled in the volume device 1 and the resulting talk-back voice is amplified in the output amplifier 3, and then, it is outputted from the speaker 52. The microphone 53 is used for inputting the user's voice. In practice, however, a talk-back voice and surrounding noise, for example, engine noise (noise occurring when a vehicle is running), and output from the speaker 52 are also inputted into the microphone 53. The voice recognition engine 54 recognizes the input voice from the microphone 53 and executes a command corresponding to the input voice for a device to be controlled (not shown), for example, a navigation system.
  • The [0034] adaptive filter 4 includes, as shown in FIG. 2, a coefficient identification unit 21 and a voice correction filter 22. The coefficient identification unit 21 is a filter for identifying the transfer function (the filter coefficient of the voice correction filter 22) of the acoustic system from the speaker 52 to the microphone 53, and more specifically, the coefficient identification unit 21 is an adaptive filter using the least mean square (LMS) algorithm or normalized-LMS (N-LMS) algorithm. The coefficient identification unit 21 is operated so that the power of the error e(n) output from the subtractor 5 is minimized so as to identify the impulse response of the acoustic system.
  • The [0035] voice correction filter 22 performs convolutional computation by using the filter coefficient w(n) determined by the coefficient identification unit 21 and a talk-back voice x(n) to be controlled so as to supply the same transfer characteristic as that of the acoustic system to the talk-back voice x(n). As a result, a simulated talk-back voice y(n) simulating the talk-back voice inputted into the microphone 53 is generated. The adaptive filter 4 may also be referred to as an adaptive filter unit.
  • The [0036] subtractor 5 subtracts the simulated talk-back voice y(n) generated by the adaptive filter 4 from the voice (mixed voice including the voice command, talk-back voice, and surrounding noise) inputted into the microphone 53 so as to extract the voice command (input voice) and surrounding noise (for example, engine noise). The subtractor 5 may be referred to as an input-voice extracting unit.
  • The mixed voice including the input voice and surrounding noise extracted by the [0037] subtractor 5 is supplied to the voice recognition engine 54. After performing typical noise processing, for example, filter processing or spectrum subtraction, the voice recognition engine 54 recognizes the voice command. The mixed voice extracted by the subtractor 5 is also fed back to the coefficient identification unit 21 of the adaptive filter 4 and the gain controller 2 as the error e(n).
  • Based on the simulated talk-back voice y(n) output from the [0038] adaptive filter 4 and the mixed voice e(n) including the input voice and surrounding noise outputted from the subtractor 5, the gain controller 2 calculates the optimal gain to be added to the talk-back voice output from the voice output unit 51, and outputs the calculated gain to the volume device 1. In this case, the mixed voice e(n) is considered as noise for the talk-back voice, and the gain of the talk-back voice to be outputted from the speaker 52 is adjusted so that the talk-back voice can be articulated to the user.
  • The [0039] volume device 1 performs gain correction for the talk-back voice outputted from the voice output unit 51. More specifically, the volume device 1 corrects for the talk-back voice outputted from the voice output unit 51 by supplying the gain calculated by the gain controller 2 to the talk-back voice. This correction is conducted for, for example, each of a plurality of divided frequency bands.
  • The operation of the [0040] voice recognition system 100 configured as described above is briefly described. The gain of the talk-back voice outputted from the voice output unit 51 is adjusted by the volume device 1 and the gain controller 2 so as to improve the articulation of the talk-back voice. The talk-back voice outputted from the volume device 1 is amplified in the output amplifier 3 at a predetermined magnifying power, and is then outputted from the speaker 52.
  • The talk-back voice outputted from the [0041] speaker 52 is inputted into the microphone 53. In this case, if the user issues a voice command, the voice command is also inputted into the microphone 53, and if the vehicle is running, surrounding noise, for example, engine sound or load noise, is also inputted into the microphone 53. Accordingly, the talk-back voice, input voice, and surrounding noise are inputted into the microphone 53 in a mixed manner. This mixed voice is inputted into the positive terminal of the subtractor 5. Meanwhile, the simulated talk-back voice (estimated value of the talk-back voice) generated by the adaptive filter 4 is inputted into the negative terminal of the subtractor 5.
  • The [0042] subtractor 5 subtracts the simulated talk-back voice outputted from the adaptive filter 4 from the mixed voice output from the microphone 53 to calculate the error and extract the input voice and surrounding noise. The extracted input voice and surrounding noise are supplied to the voice recognition engine 54. The voice recognition engine 54 then performs noise reduction and voice-command recognition. The extracted input voice and surrounding noise are fed back to the gain controller 2 and the adaptive filter 4 so that they can be used for improving the articulation of the talk-back voice and for calculating the estimated value of the talk-back voice.
  • FIG. 3 is a flowchart illustrating the voice recognition processing performed by the [0043] voice recognition system 100 shown in FIG. 1 of the first embodiment. Although it is not shown in FIG. 1, the voice recognition system 100 is provided with a controller for controlling the entire voice recognition operation. The flowchart of FIG. 3 is executed under the control of this controller.
  • In act S[0044] 1, the controller detects a voice recognition start trigger (for example, an operation for pressing the speech button or voice input of a predetermined keyword). Then, in act S2, the controller turns on the voice recognition engine 54 so that the system enters the voice input enable state. In this state, in act S3, the user inputs a first command, which is the topmost level of voice commands consisting of a plurality of levels.
  • The issued voice command is inputted into the [0045] microphone 53, and is supplied to the voice recognition engine 54 via the subtractor 5. Then, in act S4, the voice recognition engine 54 performs voice recognition processing (including noise reduction). In this case, the controller turns off the voice recognition engine 54 to cancel the voice input enable state. In act S5, the volume device 1 and the gain controller 2 start improving the articulation of the talk-back voice. In this state, in act S6, the voice output unit 51 starts outputting a recognition result obtained from the voice recognition engine 54 and a talk-back voice.
  • In act S[0046] 7, the controller determines during this talk-back operation whether it is necessary to continue to input voice by, for example, shifting to a lower level of commands. If the outcome of act S7 is yes, the process proceeds to act S8 in which the controller turns on the voice recognition engine 54 again so that the system enters the voice input enable state. Subsequently, in act S9, the subtractor 5 obtains the estimated value of the talk-back voice output in act S6 from the adaptive filter 4, and subtracts the estimated value from the input voice from the microphone 53 so as to remove the talk-back voice from the input voice.
  • The controller then determines in act S[0047] 10 whether a voice command has been issued. If a voice command is not issued, the process returns to act S9, and the loop operation is repeated until a voice command is issued. If a voice command is not issued for a predetermined period, the time-out processing is performed. If a voice command is issued in act S10, the process proceeds to act S11 in which the controller interrupts the talk-back operation, and returns to act S4. In this process, the talk-back operation is interrupted when a voice command is issued. It is not essential, however, that the talk-back operation be interrupted since the talk-back voice has been removed from the input voice.
  • FIG. 4B is a timing chart illustrating a change in the voice input enable state of the [0048] voice recognition system 100 of this embodiment in comparison with that of a known voice recognition system shown in FIG. 4A. The operation of the known voice recognition system has been discussed above.
  • As shown in FIG. 4B, in this embodiment, when the user first presses the speech button, the system enters the voice recognition mode to enable the user to input voice for a predetermined time period. During this period, the user issues desired voice commands. When a voice command is issued, the input voice is recognized, and a talk-back voice is outputted. The operation up to this stage is the same as that of the known voice recognition system shown in FIG. 4A. [0049]
  • In the known voice recognition system shown in FIG. 4A, the user is not allowed to input voice during the talk-back operation. Conversely, in this embodiment shown in FIG. 4B, when the recognition of the input voice is finished, the system automatically enters the voice input enable state. This enables the user to continue to input his/her voice at any time without having to wait until the talk-back operation is finished. As a result, the waiting time can be reduced. [0050]
  • As described in detail above, according to the first embodiment, voice input can be accepted when necessary even during a talk-back operation, and a user is able to input his/her voice at any time without having to wait until the talk-back operation is finished. Thus, the voice recognition operation time can be reduced. Additionally, the user does not have to press the speech button every time he/she issues a voice command, thereby eliminating a complicated operation. [0051]
  • In the first embodiment, by using the simulated talk-back voice estimated in the [0052] adaptive filter 4, which is provided for enhancing the articulation of the talk-back voice, the talk-back voice is removed from the input voice from the microphone 53. This eliminates the need to separately provide a dedicated adaptive filter. Thus, the articulation of the talk-back voice can be enhanced, and also, the voice recognition operation time can be reduced without increasing the cost.
  • Second Embodiment [0053]
  • A [0054] voice recognition system 200 constructed in accordance with a second embodiment of the present invention is described below with reference to FIGS. 5 and 6. FIG. 5 is a block diagram illustrating elements of the voice recognition system 200. In FIG. 5, elements having the same functions as those of the elements in FIG. 1 are indicated by like reference numerals, and an explanation thereof is thus omitted.
  • In addition to the elements of the [0055] voice recognition system 100 shown in FIG. 1, the voice recognition system 200 includes, as shown in FIG. 5, output amplifiers 6-1 and 6-2, second adaptive filters 7-1 and 7-2, an adder 8, a subtractor 9, an audio playback unit 61, and speakers 62-1 and 62-2 having a plurality of channels (right channel and left channel).
  • The [0056] audio playback unit 61 plays back various audio sources, for example, compact discs (CDs), Mini Discs (MDs), digital versatile disks (DVDs), and radio broadcasts. The output amplifiers 6-1 and 6-2 amplify audio sounds of the right and left channels played back by the audio playback unit 61 at a predetermined amplifying factor, and supply the amplified audio sounds to the speakers 62-1 and 62-2, respectively. The audio sounds are then outputted from the speakers 62-1 and 62-2 to the microphone 53 together with an input voice and a talk-back voice outputted from the speaker 52.
  • The second adaptive filters [0057] 71-1 and 71-2 are configured, as shown in FIG. 2, similarly to the adaptive filter 4. The second adaptive filter 71-1 identifies the filter coefficient simulating the transfer system from the right-channel speaker 62-1 to the microphone 53, and performs filter processing on the right-channel audio sound to generate a simulated right-channel audio sound.
  • The other second adaptive filter [0058] 7-2 identifies the filter coefficient simulating the transfer system from the left-channel speaker 62-2 to the microphone 53, and performs filter processing on the left-channel audio sound to generate a simulated left-channel audio sound.
  • In the second embodiment, the [0059] adaptive filter 4 forms the first adaptive filter unit, and the second adaptive filters 7-1 and 7-2 form the second adaptive filter unit. The adder 8 adds the simulated right-channel and left-channel audio sounds outputted from the second adaptive filters 7-1 and 7-2, and outputs the added simulated sound to the subtractor 9.
  • In this embodiment, the [0060] subtractor 5 subtracts a simulated talk-back voice generated by the adaptive filter 4 from a voice (mixed voice including a voice command, talk-back voice, audio sound, and surrounding noise) inputted into the microphone 53 so as to extract the voice command, audio sound, and surrounding noise. The subtractor 9 further subtracts the simulated audio sound generated by the second adaptive filters 7-1 and 7-2 and the adder 8 from the voice outputted from the subtractor 5 so as to extract the voice command (input voice) and surrounding noise. The subtractors 5 and 9 may be referred to as the input-voice extracting unit.
  • The [0061] voice recognition engine 54 reduce the surrounding noise contained in the mixed voice extracted by the subtractor 5 to recognize only the voice command. The mixed voice extracted by the subtractor 5 is also fed back to the gain controller 2 and the adaptive filter 4. The mixed voice extracted by the subtractor 9 is supplied to the voice recognition engine 54 and is also fed back to the second adaptive filters 7-1 and 7-2.
  • The operation of the [0062] voice recognition system 200 configured as described above is briefly discussed below. The gain of the talk-back voice outputted from the voice output unit 51 is adjusted by the volume device 1 and the gain controller 2 so as to improve the articulation of the talk-back voice. The talk-back voice output from the volume device 1 is then amplified in the output amplifier 3 at a predetermined amplifying factor and is then outputted from the speaker 52.
  • The audio sounds output from the [0063] audio playback unit 61 are amplified in the output amplifiers 6-1 and 6-2, and are then outputted from the speakers 62-1 and 62-2, respectively.
  • The talk-back voice output from the [0064] speaker 52 and the audio sounds output from the speakers 62-1 and 62-2 are inputted into the microphone 53. In this case, if the user issues a voice command, it is also inputted into the microphone 53, and if the vehicle is running, surrounding noise, for example, engine sound or load noise, is also inputted into the microphone 53. Accordingly, the talk-back voice, audio sounds, input voice, and surrounding noise are inputted into the microphone 53 in a mixed manner.
  • This mixed voice is inputted into the positive terminal of the [0065] subtractor 5. Meanwhile, the simulated talk-back voice generated by the adaptive filter 4 is inputted into the negative terminal of the subtractor 5. The subtractor 5 subtracts the simulated talk-back voice output from the adaptive filter 4 from the mixed voice output from the microphone 53 so as to calculate the error and extract the audio sounds, input voice, and surrounding noise.
  • The mixed voice including the audio sounds, input voice, and surrounding noise extracted by the [0066] subtractor 5 is inputted into the positive terminal of the subtractor 9. Meanwhile, the simulated audio sound generated by the second adaptive filters 7-1 and 7-2 and the adder 8 is inputted into the negative terminal of the subtractor 9. The subtractor 9 subtracts the simulated audio sound output from the adder 8 from the mixed voice output from the subtractor 5 to calculate the error and extract the input voice and surrounding noise.
  • The extracted input voice and surrounding noise are supplied to the [0067] voice recognition engine 54. The voice recognition engine 54 reduces the surrounding noise and recongizes the voice command. The audio sounds, input voice, and surrounding noise extracted by the subtractor 5 are also fed back to the gain controller 2 and the adaptive filter 4 and are used for enhancing the articulation of the talk-back voice and for estimating the talk-back voice. The input voice and surrounding noise extracted by the subtractor 9 are also fed back to the second adaptive filters 7-1 and 7-2 and are used for estimating the audio sound.
  • FIG. 6 is a flowchart illustrating the voice recognition processing performed by the [0068] voice recognition system 200 of the second embodiment. In FIG. 6, the same process steps as those of FIG. 3 are indicated by like step numbers, and an explanation thereof is thus omitted. The process different from that of FIG. 3 is in that acts S21 and S22 (removing audio sound) are inserted between acts S2 and S3 and between acts S9 and S10, respectively.
  • In acts S[0069] 21 and S22, the estimated value of audio sound outputted from the adder 8 is subtracted by the subtractor 9 from the mixed voice, including the audio sound, input voice, and surrounding noise, outputted from the subtractor 5 to remove the audio sound from the mixed voice. The input voice and surrounding noise are then extracted.
  • As described in detail above, according to the second embodiment, even if a voice is inputted during the talk-back operation and audio playback operation, the talk-back voice and audio sound are removed from the input voice outputted from the microphone so as to extract the input voice and surrounding noise. The extracted input voice and surrounding noise are then supplied to the [0070] voice recognition engine 54. Accordingly, even while the talk-back operation and audio playback operation are being performed, the user can input voice at any time, thereby reducing the voice recognition operation time.
  • Third Embodiment [0071]
  • A [0072] voice recognition system 300 constructed in accordance with a third embodiment of the present invention is now described with reference to FIG. 7. Elements having the same functions as those of the elements shown in FIG. 5 are designated with like reference numerals, and an explanation thereof is thus omitted.
  • In the second embodiment shown in FIG. 5, the talk-back voice and the audio sound are outputted to the different elements. In contrast, in the third embodiment shown in FIG. 7, a talk-back voice and audio sound are outputted to the same element. [0073]
  • In the [0074] voice recognition system 300 shown in FIG. 7, the output amplifier 3 shown in FIG. 5 is eliminated, and only two output amplifiers 6-1 and 6-2 are provided. Also, instead of the adaptive filter 4 shown in FIG. 5, a variable filter 10 is provided, and an adder 11 is further provided. The other elements are similar to those of FIG. 5.
  • In FIG. 7, the [0075] adder 11 adds a talk-back voice outputted from the volume device 1 and a right-channel audio sound played back by the audio playback unit 61, and outputs the mixed voice to the output amplifier 6-1 and the adaptive filter 7-1. The output amplifier 6-1 amplifies the voice outputted from the adder 11 at a predetermined amplifying factor, and outputs the amplified voice from the right-channel speaker 62-1.
  • The adaptive filter [0076] 7-1 identifies the filter coefficient simulating the transfer system from the right-channel speaker 62-1 to the microphone 53. By using this identified filter coefficient, the adaptive filter 7-1 filters the mixed voice including the talk-back voice and the right-channel audio sound outputted from the adder 11 to generate a simulated mixed voice.
  • The [0077] variable filter 10, which is a voice correction filter, copies the filter coefficient identified by the adaptive filter 7-1 to the variable filter 10, the filter coefficient set in the variable filter 10 being variable. The variable filter 10 filters the talk-back voice outputted from the volume device 1 to generate a simulated talk-back voice to be inputted into the microphone 53.
  • The right-channel adaptive filter [0078] 7-1, which is the copy source of the filter coefficient to be inputted into the variable filter 10, simulates the transfer system from the right-channel speaker 62-1, which outputs the talk-back voice, to the microphone 53. If, for example, the voice recognition system 300 of this embodiment is used in a navigation system, the talk-back voice is outputted from the right-channel speaker 62-1 installed near the driver's seat, and the microphone 53 receiving this talk-back voice is also installed near the driver's seat. In this case, it is thus preferable that the filter coefficient of the right-channel adaptive filter 7-1 be copied into the variable filter 10. If the driver's seat is at the left side of the vehicle, it is preferable that the filter coefficient of the left-channel adaptive filter 7-2 be copied into the variable filter 10.
  • The operation performed by the [0079] voice recognition system 300 of the third embodiment is briefly discussed below. The gain of the talk-back voice outputted from the voice output unit 51 is adjusted by the volume device 1 and the gain controller 2 so as to improve the articulation of the talk-back voice.
  • The talk-back voice output from the [0080] volume device 1 is added to the right-channel audio sound played back by the audio playback unit 61 in the adder 11, and the mixed voice is then amplified in the output amplifier 6-1 at a predetermined amplifying factor and is then outputted from the speaker 62-1. The left-channel audio sound played back by the audio playback unit 61 is amplified in the outputted amplifier 6-2 at a predetermined amplifying factor and is then outputted from the speaker 62-2.
  • The voice (including the talk-back voice and right-channel audio sound) output from the speaker [0081] 62-1 and the left-channel audio sound output from the speaker 62-2 are inputted into the microphone 53. In this case, if the user issues a voice command, it is also inputted into the microphone 53, and if the vehicle is running, surrounding noise, for example, engine sound or load noise, is also inputted into the microphone 53. Accordingly, the talk-back voice, left-channel and right-channel audio sounds, input voice, and surrounding noise are inputted into the microphone 53 in a mixed manner.
  • This mixed voice is inputted into the positive terminals of the [0082] subtractors 5 and 9. Meanwhile, the simulated talk-back voice generated by the variable filter 10 is inputted into the negative terminal of the subtractor 5. The subtractor 5 subtracts the simulated talk-back voice output from the variable filter 10 from the mixed voice output from the microphone 53 so as to calculate the error and extract the audio sound, input voice, and surrounding noise. The extracted mixed voice is fed back to the gain controller 2 and is used for enhancing the articulation of the talk-back voice.
  • The mixed voice including the talk-back voice and right-channel audio sound output from the [0083] adder 11 is also inputted into the adaptive filter 7-1. The adaptive filter 7-1 then generates a simulated voice including the talk-back voice and right-channel audio sound. Meanwhile, a simulated left-channel audio sound is generated in the adaptive filter 7-2.
  • The simulated right-channel and left-channel audio sounds generated by the adaptive filters [0084] 7-1 and 7-2, respectively, are added in the adder 8, and the added simulated audio sound is inputted into the negative terminal of the subtractor 9. The subtractor 9 subtracts the simulated voice including the talk-back voice and audio sound output from the adder 8 from the mixed voice output from the subtractor 5 so as to calculate the error and extract the input voice and surrounding noise.
  • The input voice and surrounding noise extracted by the subtractor [0085] 9 are supplied to the voice recognition engine 54. The voice recognition engine 54 reduces noise to recognize the voice command (input voice). The input voice and surrounding noise extracted by the subtractor 9 are also fed back to the adaptive filters 7-1 and 7-2 and are used for estimating the audio sound.
  • The voice recognition processing performed by the [0086] voice recognition system 300 of the third embodiment is similar to that shown in FIG. 6, and an explanation thereof is thus omitted.
  • As described in detail above, as in the second embodiment, according to the third embodiment, voice input is accepted during the talk-back operation and audio playback operation when necessary, and the user can input his/her voice at any time during this period. In the third embodiment, it is not necessary to provide advanced adaptive filters containing algorithms for identifying filter coefficients in order to estimate a talk-back voice, thereby reducing the cost. The [0087] variable filter 10 does not have to perform computation for identifying filter coefficients because it copies the filter coefficient from the adaptive filter 7-1, thereby reducing the processing load.
  • While the present invention has been described with reference to the above-described embodiments, these embodiments are examples only to carry out the invention, and the technical scope of the invention should not be restricted by these embodiments. Various modifications and changes can be made to the present invention without departing from the spirit or major features of the invention. [0088]

Claims (16)

1. A voice recognition system, comprising:
a microphone;
a speaker operable to output a talk-back voice;
an adaptive filter unit comprising a voice correction unit and a coefficient identification unit, the voice correction unit operable to generate a simulated talk-back voice to be inputed into the microphone and the coefficient identification unit operable to generate a coefficient for use by the voice correction unit;
an input-voice extracting unit connected with the adaptive filter unit and operable to extract a voice input by subtracting the simulated talk-back voice from sound received by the microphone; and
a gain controller connected with the adaptive filter and the input-voice extracting unit and operable to adjust the volume of the talk-back voice.
2. The voice recognition system of claim 1, wherein the simulated talk-back voice and the voice input extracted from the input-voice extracting unit are inputted into the gain controller.
3. A voice recognition system, comprising:
a microphone;
a first speaker operable to output a talk-back voice;
a second speaker operable to output an audio sound;
a first adaptive filter unit operable to generate a simulated talk-back voice by setting a first filter coefficient;
a second adaptive filter unit operable to generate a simulated audio sound by setting a second filter coefficient; and
an input-voice extracting unit operable to extract a voice input by subtracting the simulated talk-back voice and the simulated audio sound from sound inputted into the microphone;
wherein the talk-back voice is filtered before being outputted by the first speaker and the audio sound is filtered before being outputted by the second speaker.
4. The voice recognition system of claim 3, further comprising an articulation enhancing processor connected with said first adaptive filer and the input-voice extracting unit.
5. The voice recognition system of claim 3, further comprising at least one additional speaker operable to output an additional channel of audio sound and at least one additional adaptive filer unit operable to generate a simulated audio sound produced by said at least one additional speaker.
6. A voice recognition system, comprising:
a microphone;
at least one speaker operable to output a talk-back voice;
a first adaptive filter unit operable to generate a simulated mixed voice to be inputted into the microphone by setting a first filter coefficient; and
an input-voice extracting unit operable to extract the input voice by subtracting the simulated mixed voice from sound inputted into the microphone; and
a gain controller connected with the adaptive filter and the input-voice extracting unit and operable to adjust the volume of the talk-back voice.
7. The voice recognition system of claim 6, wherein:
said at least one speaker comprises a plurality of speakers provided for outputting a plurality of channels of audio sounds, and the talk-back voice is outputted from at least one of the plurality of speakers; and
said first adaptive filter unit is operable to receive a mixed voice including the talk-back voice and audio sound of a channel output from said at least one of the plurality of speakers.
8. The voice recognition system of claim 7, further comprising a second adaptive filter unit operable to generate a simulated audio sound to be inputted into the microphone by setting a second filter coefficient, wherein said input-voice extracting unit is operable to extract the input voice by subtracting the simulated mixed voice and the simulated audio sound from sound inputted into the microphone.
9. The voice recognition system of claim 7, further comprising:
a variable filter operable to copy the first filter coefficient set by said first adaptive filter unit and filter the talk-back voice before being outputted from said at least one of the plurality of speakers to generate a simulated talk-back voice; and
an articulation enhancing processor operable to enhance the articulation of the talk-back voice before being outputted from said at least one of the plurality of speakers.
10. A voice recognition method comprising the acts of:
providing a voice recognition engine;
providing a microphone;
providing a speaker;
setting a voice input state to an enable state in which voice input is accepted;
inputting sound into the microphone;
generating a simulated talk-back voice by setting a filter coefficient in an adaptive filter unit;
generating a talk-back voice;
extracting a voice input by subtracting the simulated talk-back voice from the sound inputted into the microphone;
filtering the talk-back voice prior to output by the speaker; and
supplying the extracted voice input to the recognition processor.
11. The voice recognition method of claim 10, further comprising the act of using the simulated talk-back voice to enhance the articulation of the talk-back voice prior to output from the speaker.
12. A voice recognition method comprising the acts of:
providing a voice recognition engine;
providing a microphone;
providing a first speaker operable to output a talk-back voice;
providing a second speaker operable to output audio sound;
setting a voice input state to an enable state in which voice input is accepted;
inputting sound into the microphone;
generating a simulated talk-back voice by setting a first filter coefficient in a first adaptive filter unit;
generating a simulated audio sound by setting a second filter coefficient in a second adaptive filter unit;
generating a talk-back voice;
filtering the talk-back voice prior to output by the first speaker;
filtering the audio sound prior to output by the second speaker;
extracting a voice input by subtracting the simulated talk-back voice and simulated audio sound from the sound inputted into the microphone; and
supplying the extracted voice input to the recognition processor.
13. The voice recognition method of claim 12, further comprising the act of using the simulated talk-back voice to enhance the articulation of the talk-back voice prior to output from the first speaker.
14. A voice recognition method comprising the acts of:
providing a voice recognition engine;
providing a microphone;
providing a speaker operable to output a mixed voice, the mixed voice comprising a talk-back voice and an audio sound output;
setting a voice input state to an enable state in which voice input is accepted;
inputting sound into the microphone;
generating a simulated mixed voice by setting a filter coefficient in an adaptive filter unit;
extracting a voice input by subtracting the simulated mixed voice from the sound inputted into the microphone;
filtering a mixed voice prior to output by the speaker; and
supplying the extracted voice input to the recognition processor.
15. A voice recognition method comprising the acts of:
providing a voice recognition engine;
providing a microphone;
providing a plurality of speakers operable to output a plurality of channels of audio sounds and a talk-back voice, wherein the talk-back voice is outputted from at least one of the plurality of speakers, setting a voice input state to an enable state in which voice input is accepted;
inputting sound into the microphone;
generating a simulated mixed voice by setting a filter coefficient in an adaptive filter unit;
extracting a voice input by subtracting the simulated mixed voice from the sound inputted into the microphone;
filtering a mixed voice prior to output by at least of the plurality of speakers; and
supplying the extracted voice input to the recognition processor.
16. The voice recognition method of claim 15, further comprising the act of:
generating a simulated talk-back voice by copying the filter coefficient set in the adaptive filter unit into a variable filter and by using the variable filter to filter the talk-back voice prior to output from at least one of the plurality of speakers; and
using the simulated talk-back voice to enhance the articulation of the talk-back voice prior to output from at least one of the plurality of speakers by using the simulated talk-back voice.
US10/835,742 2003-05-02 2004-04-30 Speech recognition system and method utilizing adaptive cancellation for talk-back voice Active 2026-07-26 US7552050B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2003127378A JP4209247B2 (en) 2003-05-02 2003-05-02 Speech recognition apparatus and method
JP2003-127378 2003-05-02

Publications (2)

Publication Number Publication Date
US20040260549A1 true US20040260549A1 (en) 2004-12-23
US7552050B2 US7552050B2 (en) 2009-06-23

Family

ID=32985618

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/835,742 Active 2026-07-26 US7552050B2 (en) 2003-05-02 2004-04-30 Speech recognition system and method utilizing adaptive cancellation for talk-back voice

Country Status (5)

Country Link
US (1) US7552050B2 (en)
EP (1) EP1475781B1 (en)
JP (1) JP4209247B2 (en)
CN (1) CN1258753C (en)
DE (1) DE602004014675D1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006104845A1 (en) 2000-12-21 2006-10-05 Medtronic, Inc. System and method for ventricular pacing with progressive conduction check interval
US20070225049A1 (en) * 2006-03-23 2007-09-27 Andrada Mauricio P Voice controlled push to talk system
US20090187406A1 (en) * 2008-01-17 2009-07-23 Kazunori Sakuma Voice recognition system
US20090259397A1 (en) * 2008-04-10 2009-10-15 Richard Stanton Navigation system with touchpad remote
US20140163977A1 (en) * 2012-12-12 2014-06-12 Amazon Technologies, Inc. Speech model retrieval in distributed speech recognition systems
US20140309996A1 (en) * 2013-04-10 2014-10-16 Via Technologies, Inc. Voice control method and mobile terminal apparatus
US20140350926A1 (en) * 2013-05-24 2014-11-27 Motorola Mobility Llc Voice Controlled Audio Recording System with Adjustable Beamforming
US9679563B2 (en) 2014-06-30 2017-06-13 Samsung Electronics Co., Ltd. Operating method for microphones and electronic device supporting the same
US20180166073A1 (en) * 2016-12-13 2018-06-14 Ford Global Technologies, Llc Speech Recognition Without Interrupting The Playback Audio
EP3611724A4 (en) * 2017-04-10 2020-04-29 Beijing Orion Star Technology Co., Ltd. Voice response method and device, and smart device
US10726837B2 (en) 2017-11-02 2020-07-28 Hisense Visual Technology Co., Ltd. Voice interactive device and method for controlling voice interactive device

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2257082A1 (en) * 2009-05-28 2010-12-01 Harman Becker Automotive Systems GmbH Background noise estimation in a loudspeaker-room-microphone system
CN101902674B (en) * 2010-08-13 2012-11-28 西安交通大学 Self-excitation eliminating method of high gain public address system based on space counteracting
US9373338B1 (en) * 2012-06-25 2016-06-21 Amazon Technologies, Inc. Acoustic echo cancellation processing based on feedback from speech recognizer
CN103971681A (en) * 2014-04-24 2014-08-06 百度在线网络技术(北京)有限公司 Voice recognition method and system
CN104167212A (en) * 2014-08-13 2014-11-26 深圳市泛海三江科技发展有限公司 Audio processing method and device of intelligent building system
KR102437156B1 (en) * 2015-11-24 2022-08-26 삼성전자주식회사 Electronic device and method for processing voice signal according to state of electronic device
EP3410433A4 (en) * 2016-01-28 2019-01-09 Sony Corporation Information processing device, information processing method, and program
JP2019020678A (en) * 2017-07-21 2019-02-07 株式会社レイトロン Noise reduction device and voice recognition device
JP7186375B2 (en) * 2018-03-29 2022-12-09 パナソニックIpマネジメント株式会社 Speech processing device, speech processing method and speech processing system

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5241692A (en) * 1991-02-19 1993-08-31 Motorola, Inc. Interference reduction system for a speech recognition device
US5412735A (en) * 1992-02-27 1995-05-02 Central Institute For The Deaf Adaptive noise reduction circuit for a sound reproduction system
US5548682A (en) * 1993-02-16 1996-08-20 Mita Industrial Co., Ltd. Method of automatically creating control sequence software and apparatus therefor
US5615270A (en) * 1993-04-08 1997-03-25 International Jensen Incorporated Method and apparatus for dynamic sound optimization
US5664019A (en) * 1995-02-08 1997-09-02 Interval Research Corporation Systems for feedback cancellation in an audio interface garment
US5796849A (en) * 1994-11-08 1998-08-18 Bolt, Beranek And Newman Inc. Active noise and vibration control system accounting for time varying plant, using residual signal to create probe signal
US5822402A (en) * 1996-05-02 1998-10-13 Marszalek; Gary Allen Method and apparatus for processing synthesized speech and synthesizer volume for calling line identification data messages
US5864804A (en) * 1995-06-10 1999-01-26 U.S. Philips Corporation Voice recognition system
US5907622A (en) * 1995-09-21 1999-05-25 Dougherty; A. Michael Automatic noise compensation system for audio reproduction equipment
US6263078B1 (en) * 1999-01-07 2001-07-17 Signalworks, Inc. Acoustic echo canceller with fast volume control compensation
US20020041678A1 (en) * 2000-08-18 2002-04-11 Filiz Basburg-Ertem Method and apparatus for integrated echo cancellation and noise reduction for fixed subscriber terminals
US20030040910A1 (en) * 1999-12-09 2003-02-27 Bruwer Frederick J. Speech distribution system
US20030069727A1 (en) * 2001-10-02 2003-04-10 Leonid Krasny Speech recognition using microphone antenna array
US6700977B2 (en) * 1997-04-15 2004-03-02 Nec Corporation Method and apparatus for cancelling multi-channel echo
US6725193B1 (en) * 2000-09-13 2004-04-20 Telefonaktiebolaget Lm Ericsson Cancellation of loudspeaker words in speech recognition
US20040111258A1 (en) * 2002-12-10 2004-06-10 Zangi Kambiz C. Method and apparatus for noise reduction
US6876968B2 (en) * 2001-03-08 2005-04-05 Matsushita Electric Industrial Co., Ltd. Run time synthesizer adaptation to improve intelligibility of synthesized speech
US7039182B1 (en) * 1999-05-28 2006-05-02 3Com Corporation Echo canceller having improved noise immunity
US7079645B1 (en) * 2001-12-18 2006-07-18 Bellsouth Intellectual Property Corp. Speaker volume control for voice communication device
US7340063B1 (en) * 1999-07-19 2008-03-04 Oticon A/S Feedback cancellation with low frequency input
US7421017B2 (en) * 2002-08-13 2008-09-02 Fujitsu Limited Digital filter adaptively learning filter coefficient

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0522779A (en) 1991-07-09 1993-01-29 Sony Corp Speech recognition remote controller
US5548681A (en) * 1991-08-13 1996-08-20 Kabushiki Kaisha Toshiba Speech dialogue system for realizing improved communication between user and system
JPH08335094A (en) 1995-06-08 1996-12-17 Nippon Telegr & Teleph Corp <Ntt> Voice input method and device for executing this method
US5848163A (en) * 1996-02-02 1998-12-08 International Business Machines Corporation Method and apparatus for suppressing background music or noise from the speech input of a speech recognizer

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5241692A (en) * 1991-02-19 1993-08-31 Motorola, Inc. Interference reduction system for a speech recognition device
US5412735A (en) * 1992-02-27 1995-05-02 Central Institute For The Deaf Adaptive noise reduction circuit for a sound reproduction system
US5548682A (en) * 1993-02-16 1996-08-20 Mita Industrial Co., Ltd. Method of automatically creating control sequence software and apparatus therefor
US5615270A (en) * 1993-04-08 1997-03-25 International Jensen Incorporated Method and apparatus for dynamic sound optimization
US5796849A (en) * 1994-11-08 1998-08-18 Bolt, Beranek And Newman Inc. Active noise and vibration control system accounting for time varying plant, using residual signal to create probe signal
US5664019A (en) * 1995-02-08 1997-09-02 Interval Research Corporation Systems for feedback cancellation in an audio interface garment
US5864804A (en) * 1995-06-10 1999-01-26 U.S. Philips Corporation Voice recognition system
US5907622A (en) * 1995-09-21 1999-05-25 Dougherty; A. Michael Automatic noise compensation system for audio reproduction equipment
US5822402A (en) * 1996-05-02 1998-10-13 Marszalek; Gary Allen Method and apparatus for processing synthesized speech and synthesizer volume for calling line identification data messages
US6700977B2 (en) * 1997-04-15 2004-03-02 Nec Corporation Method and apparatus for cancelling multi-channel echo
US6263078B1 (en) * 1999-01-07 2001-07-17 Signalworks, Inc. Acoustic echo canceller with fast volume control compensation
US7039182B1 (en) * 1999-05-28 2006-05-02 3Com Corporation Echo canceller having improved noise immunity
US7340063B1 (en) * 1999-07-19 2008-03-04 Oticon A/S Feedback cancellation with low frequency input
US20030040910A1 (en) * 1999-12-09 2003-02-27 Bruwer Frederick J. Speech distribution system
US20020041678A1 (en) * 2000-08-18 2002-04-11 Filiz Basburg-Ertem Method and apparatus for integrated echo cancellation and noise reduction for fixed subscriber terminals
US6725193B1 (en) * 2000-09-13 2004-04-20 Telefonaktiebolaget Lm Ericsson Cancellation of loudspeaker words in speech recognition
US6876968B2 (en) * 2001-03-08 2005-04-05 Matsushita Electric Industrial Co., Ltd. Run time synthesizer adaptation to improve intelligibility of synthesized speech
US20030069727A1 (en) * 2001-10-02 2003-04-10 Leonid Krasny Speech recognition using microphone antenna array
US7079645B1 (en) * 2001-12-18 2006-07-18 Bellsouth Intellectual Property Corp. Speaker volume control for voice communication device
US7421017B2 (en) * 2002-08-13 2008-09-02 Fujitsu Limited Digital filter adaptively learning filter coefficient
US20040111258A1 (en) * 2002-12-10 2004-06-10 Zangi Kambiz C. Method and apparatus for noise reduction

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006104845A1 (en) 2000-12-21 2006-10-05 Medtronic, Inc. System and method for ventricular pacing with progressive conduction check interval
US20070225049A1 (en) * 2006-03-23 2007-09-27 Andrada Mauricio P Voice controlled push to talk system
US20090187406A1 (en) * 2008-01-17 2009-07-23 Kazunori Sakuma Voice recognition system
US8209177B2 (en) 2008-01-17 2012-06-26 Alpine Electronics, Inc. Voice recognition system having articulated talk-back feature
US20090259397A1 (en) * 2008-04-10 2009-10-15 Richard Stanton Navigation system with touchpad remote
US9190057B2 (en) * 2012-12-12 2015-11-17 Amazon Technologies, Inc. Speech model retrieval in distributed speech recognition systems
US20140163977A1 (en) * 2012-12-12 2014-06-12 Amazon Technologies, Inc. Speech model retrieval in distributed speech recognition systems
US10152973B2 (en) 2012-12-12 2018-12-11 Amazon Technologies, Inc. Speech model retrieval in distributed speech recognition systems
CN107274897A (en) * 2013-04-10 2017-10-20 威盛电子股份有限公司 Voice control method and mobile terminal apparatus
US20140309996A1 (en) * 2013-04-10 2014-10-16 Via Technologies, Inc. Voice control method and mobile terminal apparatus
US20140350926A1 (en) * 2013-05-24 2014-11-27 Motorola Mobility Llc Voice Controlled Audio Recording System with Adjustable Beamforming
US9984675B2 (en) * 2013-05-24 2018-05-29 Google Technology Holdings LLC Voice controlled audio recording system with adjustable beamforming
US10643613B2 (en) 2014-06-30 2020-05-05 Samsung Electronics Co., Ltd. Operating method for microphones and electronic device supporting the same
US9679563B2 (en) 2014-06-30 2017-06-13 Samsung Electronics Co., Ltd. Operating method for microphones and electronic device supporting the same
US10062382B2 (en) 2014-06-30 2018-08-28 Samsung Electronics Co., Ltd. Operating method for microphones and electronic device supporting the same
US20180166073A1 (en) * 2016-12-13 2018-06-14 Ford Global Technologies, Llc Speech Recognition Without Interrupting The Playback Audio
EP3611724A4 (en) * 2017-04-10 2020-04-29 Beijing Orion Star Technology Co., Ltd. Voice response method and device, and smart device
US10726837B2 (en) 2017-11-02 2020-07-28 Hisense Visual Technology Co., Ltd. Voice interactive device and method for controlling voice interactive device
US11302328B2 (en) 2017-11-02 2022-04-12 Hisense Visual Technology Co., Ltd. Voice interactive device and method for controlling voice interactive device

Also Published As

Publication number Publication date
EP1475781A3 (en) 2004-12-15
CN1542734A (en) 2004-11-03
EP1475781A2 (en) 2004-11-10
JP2004333704A (en) 2004-11-25
CN1258753C (en) 2006-06-07
JP4209247B2 (en) 2009-01-14
EP1475781B1 (en) 2008-07-02
DE602004014675D1 (en) 2008-08-14
US7552050B2 (en) 2009-06-23

Similar Documents

Publication Publication Date Title
US7552050B2 (en) Speech recognition system and method utilizing adaptive cancellation for talk-back voice
US7747028B2 (en) Apparatus and method for improving voice clarity
JP4583781B2 (en) Audio correction device
CN106664473B (en) Information processing apparatus, information processing method, and program
JP4333369B2 (en) Noise removing device, voice recognition device, and car navigation device
US20090022330A1 (en) System for processing sound signals in a vehicle multimedia system
US8098848B2 (en) System for equalizing an acoustic signal
US7684983B2 (en) Speech recognition apparatus and vehicle incorporating speech recognition apparatus
JP2002330500A (en) Automatic sound field correction device and computer program for it
US11089404B2 (en) Sound processing apparatus and sound processing method
US7986796B2 (en) Apparatus to generate multi-channel audio signals and method thereof
US10115392B2 (en) Method for adjusting a voice recognition system comprising a speaker and a microphone, and voice recognition system
JP3877271B2 (en) Audio cancellation device for speech recognition
GB2320873A (en) Echo canceller for video conferencing
JP4381291B2 (en) Car audio system
EP1575034A1 (en) Input sound processor
JP3822397B2 (en) Voice input / output system
JP2541062B2 (en) Sound reproduction device
JP5188558B2 (en) Audio processing device
JP2922397B2 (en) Vehicle sound system
JP2001236090A (en) Voice input device
JP2001024459A (en) Audio device
JP4999267B2 (en) Voice input device
JP4166000B2 (en) Voice recognition device
JP4515731B2 (en) Audio correction device

Legal Events

Date Code Title Description
AS Assignment

Owner name: ALPINE ELECTRONICS, INC., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATSUMOTO, SHUICHI;MARUMOTO, TORU;REEL/FRAME:015754/0116;SIGNING DATES FROM 20040802 TO 20040810

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12