US20080215325A1 - Technique for accurately detecting system failure - Google Patents
Technique for accurately detecting system failure Download PDFInfo
- Publication number
- US20080215325A1 US20080215325A1 US11/964,858 US96485807A US2008215325A1 US 20080215325 A1 US20080215325 A1 US 20080215325A1 US 96485807 A US96485807 A US 96485807A US 2008215325 A1 US2008215325 A1 US 2008215325A1
- Authority
- US
- United States
- Prior art keywords
- grammar
- speech
- database
- acknowledgement
- utterance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
Definitions
- the present invention relates to a method for dividing speech.
- the present invention relates to a method for dividing speech by use of acknowledgement responses.
- a speech of an operator and that of a customer are separately recorded and converted into text data.
- a position where a predetermined keyword such as a product name is spoken is recorded, and the timestamp of the position is used as an index.
- the transcription process is performed by specifying a position of the keyword with automatic speech recognition and the like, and then by replaying the speech of a corresponding part.
- information on the conversation contents cannot be accurately extracted in such a method, since the method does not effectively use the customer's speech, particularly an acknowledgement. To be specific, it is difficult to accurately recognize and analyze a speech from a voice stream, since the speech is not divided into appropriate utterance.
- An object of the present invention is to divide a conversational dialog into speech units, what is called an utterance in linguistics by use of acknowledgement responses (hereinafter, simply called acknowledgements).
- another object of the invention is to accurately recognize a speech from a voice stream.
- an aspect of the present invention is to provide an apparatus for dividing a conversational dialog into utterance.
- the apparatus is configured to include: a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel taking a turn, that is to say speaker's channel, hereafter a main speech, among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not speaking (listener's channel); a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting the pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the speech segmented by one of the extracted candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
- the grammar database may include fixed-phrase grammar, acknowledgement grammar and recognition grammar.
- the fixed-phrase grammar may include fixed phrases for starting and ending a confirmation
- the word database may include spellings and pronunciations of the fixed phrases for starting and ending a confirmation.
- the apparatus may include a recognition-target segment determination unit which determines in advance a recognition target segment to be divided into utterance, by referring to the fixed-phrase grammar.
- Another aspect of the present invention is to provide a method for dividing a conversational dialog into utterance by use of conversational dialogs inputted in separated channels, by use of a word database in which spellings and pronunciations of words are described, and by use of a grammar database in which grammar including syntactic rules on words is described.
- the method includes the steps of: detecting a pause location in a channel making a main speech; detecting an acknowledgement location in a channel not making the main speech; extracting boundary candidates in the main speech by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and outputting a word string of the speech segmented by one of the extracted boundary candidates, after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
- FIG. 1 shows an apparatus for dividing a speech into utterance.
- FIG. 2 shows a flowchart of a processing of the present invention.
- FIG. 3 shows a diagram of specific examples of each processing of the present invention.
- FIG. 4 is a diagram showing differences in segmentation and recognition results between a conventional technique and the embodiment of the present invention.
- FIG. 5 shows an example of a hardware configuration capable of implementing the present invention.
- FIG. 1 shows an apparatus of the present invention for dividing a speech voice into utterance.
- the apparatus is mainly configured of a boundary extracting unit 120 and a recognizing unit 130 .
- the boundary extracting unit 120 is configured of a pause detecting section 122 , an acknowledgement detecting section 124 and a boundary-candidate extracting section 126 .
- both the boundary extracting unit 120 and the recognizing unit 130 make reference to a word database 140 and a grammar database 150 .
- a spelling and pronunciation of each word is stored in the word database 140
- syntactic rules on words are stored in the grammar database 150 .
- each pronunciation in the word database 140 is written in phonemic units.
- Each phoneme is preferably written by use of an acoustic model to indicate what statistical property the signal of the phoneme has. By using such acoustic models, a part of conversational dialogs can be identified as which word.
- the grammar database 150 stores fixed-phrase grammar, acknowledgement grammar and recognition grammar therein.
- the grammar is a rule for judging whether a speech agrees with a fixed pattern. Examples of the above grammar described in the BNF (Backus-Naur Form) are shown below.
- an acknowledgement is a response for making a confirmation in response to speech of a conversation partner.
- an acknowledgement is a back-channel feedback which is a short response made to a conversation partner with a main speaker unchanged in a conversation.
- acknowledgements are not positively used in dividing and recognizing speech, but rather are regarded as unnecessary.
- three acknowledgements are registered as examples in the aforementioned acknowledgement grammar, and other acknowledgements can be added as needed to the acknowledgement grammar.
- the boundary extracting unit 120 To the boundary extracting unit 120 , continuous conversational dialogs 110 , that is, voice streams are inputted in a plurality of channels corresponding to the respective speakers.
- the boundary extracting unit 120 passes, to the recognizing unit 130 , voice data of the channel mainly speaking, and boundary candidates for dividing the main speech into utterance. Specifically, a boundary candidate is passed by use of a time of the initial point of the voice stream, regarding the initial point as a base point of the boundary candidate.
- the recognizing unit 130 makes recognition using the received voice data and the aforementioned boundary candidates by referring to the recognition grammar and to a dictionary, and then outputs a word string 160 which is a recognition result corresponding to a segment obtained by dividing the voice data by dividing positions (strings) recognized as optimal.
- each of the word strings is, for example, configured of a segment and a recognized content which are expressed as XXXX for segment 1, YYYY for segment 2 and so on.
- the word string 160 is further passed to a sentence comprehending unit or the like in some applications. However, a description of the sentence comprehending unit will be omitted since the processing thereof is independent from the present invention.
- FIG. 2 shows a flowchart of a process of the present invention
- FIG. 3 shows, by use of voice-waveform data, a more specific process corresponding to each of the steps in FIG. 2 .
- the voice-waveform is expressed in trapezoidal shapes connecting the vertexes of the waveform.
- FIG. 3 describes a case where the main speech is “dewa fukushou sasete itadakimasu.
- C 1 and C 2 indicate channels, where C 2 is the voice-waveform of a speaker and C 1 is the voice-waveform of a listener who gives acknowledgements while listening to the speech.
- the object here is to appropriately recognize and comprehend the main speech (speaker's speech) by use of acknowledgements of a channel different from that of the main speech. To this end, the processing is executed by the following steps.
- a time (t s , t e ) of a speech segment to be recognized is recorded, by matching the speech of the channel corresponding to the main speech, with the fixed-phrase grammar.
- This processing is equivalent to 310 in FIG. 3 .
- the fixed-phrase grammar various fixed phrases each appearing in the beginning or ending of a group of speech are registered. Typical examples of the fixed phrases for confirmation in a telephone ordering are: “dewa fukushou sasete itadakimasu (now, your order will be repeated)” as a starting phrase; and “arigatou gozaimasita (thank you)” as an ending phrase.
- the fixed phrases for starting and ending confirmation are determined dependent on an application field, and are not limited to the above.
- a garbage grammar model
- matching with phrases other than fixed phrases to be recognized is often used in combination with the fixed-phrase grammar.
- a description for the garbage grammar is omitted since it is a well-known technique in the field.
- Steps 220 to 250 are the processing executed in the boundary extracting unit 120 for extracting boundaries (dividing position) in a speech.
- the boundary extracting unit 120 extracts boundary (dividing position) candidates in a speech in the following manner of: firstly, detecting pause locations in the channel which makes a main speech and detecting acknowledgement locations in the channel which does not make the main speech; and secondly, extracting pauses each existing within a certain time period before and after an acknowledgement location.
- the pause detecting section 122 detects pause locations in the recognition target segment in C 2 .
- This processing is equivalent to 320 in FIG. 3 .
- the detection is performed by matching the speech with a power of the usual speech and with a silent phoneme model learned in advance.
- the pauses extracted here are boundary candidates, while actual boundaries are finally decided by combining each of the candidates with likelihoods obtained by matching the candidate with an acknowledgement location and with grammar. For this reason, it is desirable to give priority not to a precision but to a recall in determining the boundary.
- the recall is a ratio of the number of pauses detected correctly to the number of pauses that should be detected.
- the precision is accuracy indicating how many of extracted locations are turned out to be the actual pauses each equivalent to a boundary of a sentence or a phrase.
- five pause candidates (p 1 , . . . , p 5 ) are detected totally.
- the acknowledgement detecting section 124 detects an acknowledgement location in the recognition target segment in channel C 1 which is for the respondent (listener), by matching the speech with the acknowledgement grammar.
- the acknowledgement grammar expressions of acknowledgement, for example, words and phrases such as “hai (yes)” and “ee (OK)” are registered. This processing is equivalent to 330 in FIG. 3 .
- three acknowledgements (r 1 , . . . , r 3 ) are extracted.
- the speech recognition is performed by dividing the recognition target segment by use of the pause locations and acknowledgement locations.
- This processing is equivalent to 340 in FIG. 3 .
- a first dividing start position is t s
- a base point is the acknowledgement location (r 1 ) first detected when the speech is scanned from the dividing start position.
- a pause existing within a certain range (r 1 ⁇ Ta, r 1 + ⁇ Tb) around the base point is extracted as a boundary candidate for the dividing end position.
- p 1 , p 2 and p 3 are extracted.
- Ta and Tb are determined in accordance with the maximum allowable time difference between the target dividing end position and an acknowledgement, and are normally set to 500 msec to 1000 msec.
- step 250 the recognizing unit 130 performs recognition on the segment between the dividing start position and each of the boundary candidates extracted in step 240 . Thereafter, the recognizing unit 130 recognizes the segment with the highest likelihood as an utterance, and thereby outputs a word string.
- This processing is equivalent to 350 in FIG. 3 .
- the recognizing unit 130 recognizes each of the segments having one of the extracted boundary candidates as the dividing end position, namely, segment A, segment B and segment C, by matching them with the recognition grammar. Then, the recognizing unit 130 obtains the recognition results and calculates likelihoods thereof at the same time.
- the recognition grammar is a collection of phrases which are supposed to appear in a speech, and which are each described in a divided speech segment as a unit. Accordingly, recognition succeeds, for example, when the following fixed patterns appear in a speech:
- One of a variety of possible likelihoods is a likelihood calculated on the basis of a probability P (X
- P X
- W the probability value
- the learning for generating the language model from a written learning text is performed with a sentence start/end symbols added to each location corresponding to the acknowledgement.
- the generated language model is used in the recognition.
- the likelihood may be calculated by using a product P(W) ⁇ P(X
- step 255 it is determined whether or not the processing is completed up to the end of the recognition target segment. If the processing is completed, the processing is terminated, while if not, the processing continues to step 260 .
- step 260 the dividing start position is changed to the dividing end position determined in step 250 , namely, the right end of segment B, denoted by p 2 . Then, steps 240 to 260 are repeated to the end of the recognition target segment. This processing is equivalent to 360 of FIG. 3 .
- FIG. 4 shows a difference in segmentation and recognition results between a conventional technique and the present invention.
- the conventional technique employed is a method in which the entire recognition target segment is automatically divided using only the recognition grammar while being recognized as one continuous speech.
- a box 410 shows the segmentation and recognition result of the conventional technique.
- the segmentation and recognition have failed in a large number of voice streams starting from “dewa chuumon fukushou sasete itadakimasu, machigaiga naika goisshoni kakunin wo onegai itasimasu (Now, your order will be repeated, and please check whether it contains any mistake).” and ending at “hai, dewa nyuuryoku itasimasita (now, your order is inputted).”
- a box 420 showing the segmentation and recognition results according to the present invention, speech segments of the main speech are correctly divided and recognized. Incidentally, commas and periods in the above description of the voice streams are inserted only to make the descriptions understandable.
- FIG. 5 shows a hardware configuration example of an information processing apparatus that can be also used as the apparatus of the present invention.
- a computer 501 includes a CPU peripheral unit that contains a CPU 500 , a PAM 540 , a ROM 530 and an input/output controller 520 all of which are mutually connected to one another via a host controller 510 .
- the computer 501 further includes a communication interface 550 , a hard disk drive 580 , a multi-combo drive 590 , a flexible disk drive 545 , a sound controller 560 and a graphic controller 570 , all of which are connected by the input/output controller 520 .
- the multi-combo drive 590 is capable of reading from and writing to disk-shaped media 595 such as a CD and a DVD
- the flexible disk drive 545 is capable of reading from and writing to a flexible disk 585
- the sound controller 560 drives a sound input/output device 565
- the graphic controller 570 drives a display 575 .
- the CPU 500 operates in accordance with programs stored in the ROM 530 , a BIOS and the RAM 540 , and thereby controls each section.
- the graphic controller 570 acquires graphic data generated in a buffer inside the RAM 540 , by the CPU 500 and the like, and then displays the data on the display 575 . Otherwise, the graphic controller 570 may include a buffer therein to store graphic data generated by the CPU 500 and the like.
- voice streams are inputted in a plurality of channels from the sound input/output device 565 , and then are stored in the storage device 580 via the input/output controller 520 .
- the word database 140 and the grammar database 150 are stored.
- Inputted and stored conversational dialogs of plural channels and these dictionaries are used to accurately divide and recognize a main speech through a computing operation by the CPU 500 .
- This computing operation is performed by loading to the memory 540 and then executing a program for speech segmentation and recognition of the present invention.
- Output results of speech segmentation and a word string are displayed on the display 575 .
- the communication interface 550 communicates with outer communication devices via a network.
- the information processing apparatus 501 can also receive conversational dialogs from the outside via the communication interface 550 , perform speech segmentation and recognition, and then transmit the result to an outer information processing apparatus via the communication interface 550 .
- any one of wired, wireless, infrared connections and a short distance radio connection such as BLUETOOTH can be employed to build the network, and any kind of network can be used without adding any alterations in order to implement the present invention.
- the storage device 580 stores codes and data for a program of the present invention, an application, an OS and the like used by the computer 501 .
- the multi-combo drive 590 reads programs or data from the media 595 such as a CD or a DVD, and thereafter the read programs and data are loaded to the PAM 540 to be used by the CPU 500 .
- the program and dictionaries of the present invention may be provided from an external recording media.
- an optical recording medium such as a DVD and a PD
- a magneto-optical recording medium such as an MD
- a tape medium a semiconductor memory such as an IC card
- the program may be obtained via the network from a server system connected to a dedicated communication network or the Internet by using, as the recording medium, a storage device such as a hard disk or a PAM provided in the server system.
- a storage device such as a hard disk or a PAM provided in the server system.
- any hardware having the function of a general computer can be employed as hardware necessary for the present invention.
- a mobile terminal, a portable terminal and home electric appliances can be used freely without any problem.
- FIG. 5 just illustrates the hardware configuration for implementing the embodiment of the present invention. Accordingly, other various configurations are possible as long as the embodiment of the present invention can be applied thereto.
- Each of the abovementioned exemplar components are not necessarily an essential component of the present invention.
- the preferred information processing apparatus 501 of the present invention employs an operating system, which supports a graphical user interface (GUI) multi-window environment, such as Windows® operation system provided by Microsoft Corporation, Mac OS® provided by Apple Inc., and Unix® system including X Window System (for example, AIX® provided by International Business Machines Corporation). Additionally, the present invention can be achieved by employing hardware, software or a combination of hardware and software.
- GUI graphical user interface
- a voice stream is divided into appropriate utterance by using information obtained by combining an acknowledgement location and a pause location. Consequently, accuracy is improved in recognition and analysis of speech.
Abstract
An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
Description
- This application claims priority under 35 U.S.C. § 119 to Japanese Patent Application No. 2006-350936 filed Dec. 27, 2006, the entire text of which is specifically incorporated by reference herein.
- The present invention relates to a method for dividing speech. In particular, the present invention relates to a method for dividing speech by use of acknowledgement responses.
- In a transcription process on telephone conversation contents in an operator-service support system and the like, a speech of an operator and that of a customer are separately recorded and converted into text data. Conventionally, in order to efficiently search a recorded speech, a position where a predetermined keyword such as a product name is spoken is recorded, and the timestamp of the position is used as an index. The transcription process is performed by specifying a position of the keyword with automatic speech recognition and the like, and then by replaying the speech of a corresponding part. However, information on the conversation contents cannot be accurately extracted in such a method, since the method does not effectively use the customer's speech, particularly an acknowledgement. To be specific, it is difficult to accurately recognize and analyze a speech from a voice stream, since the speech is not divided into appropriate utterance.
- An example of a conventional technique is disclosed in Japanese Patent Application Laid-open Publication No. 2006-276754.
- The present invention has been made in view of the aforementioned technical problems. An object of the present invention is to divide a conversational dialog into speech units, what is called an utterance in linguistics by use of acknowledgement responses (hereinafter, simply called acknowledgements). In addition, another object of the invention is to accurately recognize a speech from a voice stream.
- In order to achieve the aforementioned objects, an aspect of the present invention is to provide an apparatus for dividing a conversational dialog into utterance. The apparatus is configured to include: a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel taking a turn, that is to say speaker's channel, hereafter a main speech, among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not speaking (listener's channel); a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting the pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the speech segmented by one of the extracted candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
- In addition, the grammar database may include fixed-phrase grammar, acknowledgement grammar and recognition grammar. Moreover, the fixed-phrase grammar may include fixed phrases for starting and ending a confirmation, and the word database may include spellings and pronunciations of the fixed phrases for starting and ending a confirmation. Then, the apparatus may include a recognition-target segment determination unit which determines in advance a recognition target segment to be divided into utterance, by referring to the fixed-phrase grammar.
- Another aspect of the present invention is to provide a method for dividing a conversational dialog into utterance by use of conversational dialogs inputted in separated channels, by use of a word database in which spellings and pronunciations of words are described, and by use of a grammar database in which grammar including syntactic rules on words is described. The method includes the steps of: detecting a pause location in a channel making a main speech; detecting an acknowledgement location in a channel not making the main speech; extracting boundary candidates in the main speech by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and outputting a word string of the speech segmented by one of the extracted boundary candidates, after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
- For a more complete understanding of the present invention and the advantage thereof, reference is now made to the following description taken in conjunction with the accompanying drawings.
-
FIG. 1 shows an apparatus for dividing a speech into utterance. -
FIG. 2 shows a flowchart of a processing of the present invention. -
FIG. 3 shows a diagram of specific examples of each processing of the present invention. -
FIG. 4 is a diagram showing differences in segmentation and recognition results between a conventional technique and the embodiment of the present invention. -
FIG. 5 shows an example of a hardware configuration capable of implementing the present invention. -
FIG. 1 shows an apparatus of the present invention for dividing a speech voice into utterance. The apparatus is mainly configured of a boundary extracting unit 120 and a recognizingunit 130. The boundary extracting unit 120 is configured of a pause detecting section 122, an acknowledgement detecting section 124 and a boundary-candidate extracting section 126. In executing the processing of the present invention, both the boundary extracting unit 120 and the recognizingunit 130 make reference to aword database 140 and agrammar database 150. Specifically, a spelling and pronunciation of each word is stored in theword database 140, while syntactic rules on words are stored in thegrammar database 150. Here, each pronunciation in theword database 140 is written in phonemic units. Each phoneme is preferably written by use of an acoustic model to indicate what statistical property the signal of the phoneme has. By using such acoustic models, a part of conversational dialogs can be identified as which word. Additionally, thegrammar database 150 stores fixed-phrase grammar, acknowledgement grammar and recognition grammar therein. Here, the grammar is a rule for judging whether a speech agrees with a fixed pattern. Examples of the above grammar described in the BNF (Backus-Naur Form) are shown below. - fixed-phrase grammar: <a fixed-phrase from starting confirmation to ending the phrase>
-
= sore? dewa (kakunin/fukushou) sasete itadaki masu. |arigatou gozaimasita (= now, your order will be confirmed|repeated. |thank you) acknowledgement grammar: <acknowledgements> = hai|ee|soudesu (= yes|OK|right) recognition grammar: <confirmation of contents> = shouhin bangou ga? <NUMBER> no <PRODUCT_NAME> (= <PRODUCT_NAME> of the product number <NUMBER> |kawase hejji wa? (ari|nasi) |exchange hedge? (use|do not use) |kuchisu wa <NUMBER> kuchi |<NUMBER> units) <NUMBER> = (0|1|2|3|4|5|6|7|8|9)+ <PRODUCT NAME> = IBM gurobaru fando |IT dorimu fando|doru kokusai|... (<PRODUCT NAME>=IBM global fund|IT dream fund|dollar international|...) - Note that an acknowledgement is a response for making a confirmation in response to speech of a conversation partner. Typically, an acknowledgement is a back-channel feedback which is a short response made to a conversation partner with a main speaker unchanged in a conversation. In conventional techniques, such acknowledgements are not positively used in dividing and recognizing speech, but rather are regarded as unnecessary. However, three acknowledgements are registered as examples in the aforementioned acknowledgement grammar, and other acknowledgements can be added as needed to the acknowledgement grammar.
- To the boundary extracting unit 120, continuous conversational dialogs 110, that is, voice streams are inputted in a plurality of channels corresponding to the respective speakers. By use of a following method of the present invention, the boundary extracting unit 120 passes, to the recognizing
unit 130, voice data of the channel mainly speaking, and boundary candidates for dividing the main speech into utterance. Specifically, a boundary candidate is passed by use of a time of the initial point of the voice stream, regarding the initial point as a base point of the boundary candidate. The recognizingunit 130 makes recognition using the received voice data and the aforementioned boundary candidates by referring to the recognition grammar and to a dictionary, and then outputs a word string 160 which is a recognition result corresponding to a segment obtained by dividing the voice data by dividing positions (strings) recognized as optimal. Further, each of the word strings is, for example, configured of a segment and a recognized content which are expressed as XXXX forsegment 1, YYYY forsegment 2 and so on. Preferably, the word string 160 is further passed to a sentence comprehending unit or the like in some applications. However, a description of the sentence comprehending unit will be omitted since the processing thereof is independent from the present invention. - Hereinafter, a detailed description on the processing of the present invention will be provided with reference to
FIGS. 2 and 3 . To be specific, the description is provided for a case of telephone ordering where conversational dialogs of two channels are inputted as voice streams.FIG. 2 shows a flowchart of a process of the present invention, whileFIG. 3 shows, by use of voice-waveform data, a more specific process corresponding to each of the steps inFIG. 2 . For the sake of simplicity, the voice-waveform is expressed in trapezoidal shapes connecting the vertexes of the waveform. Here,FIG. 3 describes a case where the main speech is “dewa fukushou sasete itadakimasu. shouhin bangou 275 no IBM fando 20 kuchi, kawase hejji nasi. arigatou gozaimasita. (Now, your order will be repeated. Product code 275, IBM funds, 20 units, without exchange hedge. Thank you.)” Note that inFIG. 3 , C1 and C2 indicate channels, where C2 is the voice-waveform of a speaker and C1 is the voice-waveform of a listener who gives acknowledgements while listening to the speech. The object here is to appropriately recognize and comprehend the main speech (speaker's speech) by use of acknowledgements of a channel different from that of the main speech. To this end, the processing is executed by the following steps. - Firstly, in step 210, a time (ts, te) of a speech segment to be recognized is recorded, by matching the speech of the channel corresponding to the main speech, with the fixed-phrase grammar. This processing is equivalent to 310 in
FIG. 3 . In the fixed-phrase grammar, various fixed phrases each appearing in the beginning or ending of a group of speech are registered. Typical examples of the fixed phrases for confirmation in a telephone ordering are: “dewa fukushou sasete itadakimasu (now, your order will be repeated)” as a starting phrase; and “arigatou gozaimasita (thank you)” as an ending phrase. The fixed phrases for starting and ending confirmation are determined dependent on an application field, and are not limited to the above. Moreover, in identifying (also referred to as spotting) locations where such fixed phrases are spoken, a garbage grammar (model), matching with phrases other than fixed phrases to be recognized, is often used in combination with the fixed-phrase grammar. However, a description for the garbage grammar is omitted since it is a well-known technique in the field. Alternatively, when the entire speech is a recognition target, step 210 is not executed, and the time of the speech segment are set as (ts, te)=(0, end of call). - Steps 220 to 250 are the processing executed in the boundary extracting unit 120 for extracting boundaries (dividing position) in a speech. The boundary extracting unit 120 extracts boundary (dividing position) candidates in a speech in the following manner of: firstly, detecting pause locations in the channel which makes a main speech and detecting acknowledgement locations in the channel which does not make the main speech; and secondly, extracting pauses each existing within a certain time period before and after an acknowledgement location.
- To be more precise, in step 220, the pause detecting section 122 detects pause locations in the recognition target segment in C2. This processing is equivalent to 320 in
FIG. 3 . The detection is performed by matching the speech with a power of the usual speech and with a silent phoneme model learned in advance. The pauses extracted here are boundary candidates, while actual boundaries are finally decided by combining each of the candidates with likelihoods obtained by matching the candidate with an acknowledgement location and with grammar. For this reason, it is desirable to give priority not to a precision but to a recall in determining the boundary. Specifically, the recall is a ratio of the number of pauses detected correctly to the number of pauses that should be detected. Meanwhile, the precision is accuracy indicating how many of extracted locations are turned out to be the actual pauses each equivalent to a boundary of a sentence or a phrase. In this embodiment, five pause candidates (p1, . . . , p5) are detected totally. - In step 230, the acknowledgement detecting section 124 detects an acknowledgement location in the recognition target segment in channel C1 which is for the respondent (listener), by matching the speech with the acknowledgement grammar. In the acknowledgement grammar, expressions of acknowledgement, for example, words and phrases such as “hai (yes)” and “ee (OK)” are registered. This processing is equivalent to 330 in
FIG. 3 . In this embodiment, three acknowledgements (r1, . . . , r3) are extracted. - In the processing executed in step 240 and following steps, the speech recognition is performed by dividing the recognition target segment by use of the pause locations and acknowledgement locations. This processing is equivalent to 340 in
FIG. 3 . Suppose that a first dividing start position is ts, and that a base point is the acknowledgement location (r1) first detected when the speech is scanned from the dividing start position. Then, a pause existing within a certain range (r1−ΔTa, r1+ΔTb) around the base point is extracted as a boundary candidate for the dividing end position. Here, p1, p2 and p3 are extracted. Ta and Tb are determined in accordance with the maximum allowable time difference between the target dividing end position and an acknowledgement, and are normally set to 500 msec to 1000 msec. - In step 250, the recognizing
unit 130 performs recognition on the segment between the dividing start position and each of the boundary candidates extracted in step 240. Thereafter, the recognizingunit 130 recognizes the segment with the highest likelihood as an utterance, and thereby outputs a word string. This processing is equivalent to 350 inFIG. 3 . To be specific, the recognizingunit 130 recognizes each of the segments having one of the extracted boundary candidates as the dividing end position, namely, segment A, segment B and segment C, by matching them with the recognition grammar. Then, the recognizingunit 130 obtains the recognition results and calculates likelihoods thereof at the same time. These likelihoods are normalized on a time basis depending on the length of each speech segment, and are then compared with each other, so that the segment having the highest likelihood is determined to have the dividing end position. Thus, as a result of dividing the speech into the segments, the determined dividing end position is outputted from the recognizing unit together with the recognition result (word string). In this embodiment, segment B having the highest likelihood of 7.8 is determined as having the dividing end position. The recognition grammar is a collection of phrases which are supposed to appear in a speech, and which are each described in a divided speech segment as a unit. Accordingly, recognition succeeds, for example, when the following fixed patterns appear in a speech: -
shouhin bangou ga? <NUMBER> no <PRODUCT_NAME> (the product code is? <NUMBER> of <PRODUCT_NAME>) kawase hejji wa? (ari|nasi) (exchange hedge is ? (used|not used)) kuchisuu wa <NUMBER> kuchi (the number of units is <NUMBER> units) - One of a variety of possible likelihoods is a likelihood calculated on the basis of a probability P (X|W) that, given an outputted word string, an acoustic feature amount X is outputted from the target segment ((ts, p2) in this embodiment), where W denotes the word string. This value can be obtained consequently, when the recognizing
unit 130 matches the string of acoustic feature amount with the acoustic model. Since many studies have been made on the aforementioned likelihoods and confidence measures in the field of speech recognition, there are various possible methods for calculating the probability value. In step 250, a determination of a divided segment based on the recognition results and likelihoods may be made by using a statistical language model instead of using the recognition grammar. In such a case, the learning for generating the language model from a written learning text is performed with a sentence start/end symbols added to each location corresponding to the acknowledgement. Hence, the generated language model is used in the recognition. In the case of using the statistical language model, the likelihood may be calculated by using a product P(W)·P(X|W) obtained by multiplying the probability P (X|W) of outputting the acoustic feature amount X, by the probability P(W) of outputting the language model. - In step 255, it is determined whether or not the processing is completed up to the end of the recognition target segment. If the processing is completed, the processing is terminated, while if not, the processing continues to step 260.
- In step 260, the dividing start position is changed to the dividing end position determined in step 250, namely, the right end of segment B, denoted by p2. Then, steps 240 to 260 are repeated to the end of the recognition target segment. This processing is equivalent to 360 of
FIG. 3 . -
FIG. 4 shows a difference in segmentation and recognition results between a conventional technique and the present invention. Here, as the conventional technique, employed is a method in which the entire recognition target segment is automatically divided using only the recognition grammar while being recognized as one continuous speech. Abox 410 shows the segmentation and recognition result of the conventional technique. In thebox 410, the segmentation and recognition have failed in a large number of voice streams starting from “dewa chuumon fukushou sasete itadakimasu, machigaiga naika goisshoni kakunin wo onegai itasimasu (Now, your order will be repeated, and please check whether it contains any mistake).” and ending at “hai, dewa nyuuryoku itasimasita (now, your order is inputted).” On the other hand, in abox 420 showing the segmentation and recognition results according to the present invention, speech segments of the main speech are correctly divided and recognized. Incidentally, commas and periods in the above description of the voice streams are inserted only to make the descriptions understandable. - Next, an example of a hardware configuration of the present invention will be explained. Needless to say, each of the blocks shown in
FIG. 1 may be configured of a dedicated hardware, while the blocks can be similarly configured of a general information processing apparatus.FIG. 5 shows a hardware configuration example of an information processing apparatus that can be also used as the apparatus of the present invention. A computer 501 includes a CPU peripheral unit that contains aCPU 500, a PAM 540, a ROM 530 and an input/output controller 520 all of which are mutually connected to one another via a host controller 510. The computer 501 further includes a communication interface 550, a hard disk drive 580, a multi-combo drive 590, a flexible disk drive 545, a sound controller 560 and a graphic controller 570, all of which are connected by the input/output controller 520. Specifically, the multi-combo drive 590 is capable of reading from and writing to disk-shaped media 595 such as a CD and a DVD, the flexible disk drive 545 is capable of reading from and writing to a flexible disk 585, the sound controller 560 drives a sound input/output device 565, and the graphic controller 570 drives a display 575. - The
CPU 500 operates in accordance with programs stored in the ROM 530, a BIOS and the RAM 540, and thereby controls each section. The graphic controller 570 acquires graphic data generated in a buffer inside the RAM 540, by theCPU 500 and the like, and then displays the data on the display 575. Otherwise, the graphic controller 570 may include a buffer therein to store graphic data generated by theCPU 500 and the like. To be more specific, voice streams are inputted in a plurality of channels from the sound input/output device 565, and then are stored in the storage device 580 via the input/output controller 520. In the storage device 580, theword database 140 and thegrammar database 150 are stored. Inputted and stored conversational dialogs of plural channels and these dictionaries are used to accurately divide and recognize a main speech through a computing operation by theCPU 500. This computing operation is performed by loading to the memory 540 and then executing a program for speech segmentation and recognition of the present invention. Output results of speech segmentation and a word string are displayed on the display 575. - The communication interface 550 communicates with outer communication devices via a network. The information processing apparatus 501 can also receive conversational dialogs from the outside via the communication interface 550, perform speech segmentation and recognition, and then transmit the result to an outer information processing apparatus via the communication interface 550. Incidentally, any one of wired, wireless, infrared connections and a short distance radio connection such as BLUETOOTH can be employed to build the network, and any kind of network can be used without adding any alterations in order to implement the present invention. The storage device 580 stores codes and data for a program of the present invention, an application, an OS and the like used by the computer 501. The multi-combo drive 590 reads programs or data from the media 595 such as a CD or a DVD, and thereafter the read programs and data are loaded to the PAM 540 to be used by the
CPU 500. Instead, the program and dictionaries of the present invention may be provided from an external recording media. - As the external recording media, an optical recording medium such as a DVD and a PD, a magneto-optical recording medium such as an MD, a tape medium, a semiconductor memory such as an IC card can be used. Moreover, the program may be obtained via the network from a server system connected to a dedicated communication network or the Internet by using, as the recording medium, a storage device such as a hard disk or a PAM provided in the server system. As can be seen from the abovementioned configuration example, any hardware having the function of a general computer can be employed as hardware necessary for the present invention. For example, a mobile terminal, a portable terminal and home electric appliances can be used freely without any problem. It should be noted that
FIG. 5 just illustrates the hardware configuration for implementing the embodiment of the present invention. Accordingly, other various configurations are possible as long as the embodiment of the present invention can be applied thereto. Each of the abovementioned exemplar components are not necessarily an essential component of the present invention. - The preferred information processing apparatus 501 of the present invention employs an operating system, which supports a graphical user interface (GUI) multi-window environment, such as Windows® operation system provided by Microsoft Corporation, Mac OS® provided by Apple Inc., and Unix® system including X Window System (for example, AIX® provided by International Business Machines Corporation). Additionally, the present invention can be achieved by employing hardware, software or a combination of hardware and software.
- According to the present invention, a voice stream is divided into appropriate utterance by using information obtained by combining an acknowledgement location and a pause location. Consequently, accuracy is improved in recognition and analysis of speech.
- Although the preferred embodiment of the present invention has been described in detail, it should be understood that various changes, substitutions and alternations can be made therein without departing from spirit and scope of the present inventions as defined by the appended claims.
Claims (13)
1. An apparatus for dividing a conversational dialog into utterance, comprising:
a computer processor;
a word database for storing spellings and pronunciations of words;
a grammar database for storing syntactic rules on words;
a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels;
an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech;
a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and
a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
2. The apparatus according to claim 1 , wherein the grammar database includes fixed-phrase grammar, acknowledgement grammar and recognition grammar.
3. The apparatus according to claim 2 , comprising a recognition-target segment determination unit, wherein:
the fixed-phrase grammar includes fixed phrases for starting and ending a confirmation,
the word database includes spellings and pronunciations of the fixed phrases for starting and ending a confirmation,
the recognition-target segment determination unit determines in advance a recognition target segment to be divided into utterance, by referring to the fixed-phrase grammar.
4. A method for dividing a conversational dialog into utterance by use of conversational dialogs inputted in a plurality of channels, by use of a word database in which spellings and pronunciations of words are described, and by use of a grammar database in which grammar including syntactic rules on words is described, the method comprising the steps of:
detecting a pause location in a channel making a main speech;
detecting an acknowledgement location in a channel not making the main speech;
extracting boundary candidates of the main speech by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and
outputting a word string of the speech segmented by one of the extracted candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
5. The method according to claim 4 , wherein, in the step of outputting a word string, the likelihoods of speech segments divided by the boundary candidates are calculated in reference to the word database and the grammar database, and then the word string of the speech segment having the highest likelihood is outputted after dividing the speech segment into utterance.
6. The method according to claim 4 , wherein the grammar database includes fixed-phrase grammar, acknowledgement grammar and recognition grammar.
7. The method according to claim 6 , wherein:
the fixed-phrase grammar includes fixed phrases for starting and ending a confirmation;
the word database includes spellings and pronunciations of the fixed phrases for starting and ending a confirmation.
8. The method according to claim 6 , the method further comprising determining a recognition target segment to be divided into utterance by referring to the fixed-phrase grammar.
9. A program embodied in computer memory for dividing a conversational dialog into utterance by use of conversational dialogs inputted in a plurality of channels, by use of a word database in which spellings and pronunciations of words are described, and by use of a grammar database in which grammar including syntactic rules on words is described,
the program causing a computer to execute the functions of:
detecting a pause location in a channel making a main speech;
detecting an acknowledgement location in a channel not making the main speech;
extracting boundary candidates in the main speech by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and
outputting a word string of the speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
10. The program according to claim 9 , wherein, in the function of outputting a word string, the likelihoods of speech segments divided by the boundary candidates are calculated in reference to the word database and the grammar database, and then the word string of the speech segment having the highest likelihood is outputted after dividing the speech segment into utterance.
11. The program according to claim 9 , wherein the grammar database includes fixed-phrase grammar, acknowledgement grammar and recognition grammar.
12. The program according to claim 11 , wherein:
the fixed-phrase grammar includes fixed phrases for starting and ending a confirmation;
the word database includes spellings and pronunciations of the fixed phrases for starting and ending a confirmation.
13. The program according to claim 11 , the method further comprising a function of determining a recognition target segment to be divided into utterance by referring to the fixed-phrase grammar.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/252,009 US9128836B2 (en) | 2006-12-27 | 2011-10-03 | Technique for accurately detecting system failure |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006-350936 | 2006-12-27 | ||
JP2006350936 | 2006-12-27 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/252,009 Continuation US9128836B2 (en) | 2006-12-27 | 2011-10-03 | Technique for accurately detecting system failure |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080215325A1 true US20080215325A1 (en) | 2008-09-04 |
Family
ID=39588526
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/964,858 Abandoned US20080215325A1 (en) | 2006-12-27 | 2007-12-27 | Technique for accurately detecting system failure |
US13/252,009 Expired - Fee Related US9128836B2 (en) | 2006-12-27 | 2011-10-03 | Technique for accurately detecting system failure |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/252,009 Expired - Fee Related US9128836B2 (en) | 2006-12-27 | 2011-10-03 | Technique for accurately detecting system failure |
Country Status (6)
Country | Link |
---|---|
US (2) | US20080215325A1 (en) |
JP (1) | JP4866429B2 (en) |
KR (1) | KR101033447B1 (en) |
CN (1) | CN101568905B (en) |
TW (1) | TW200841189A (en) |
WO (1) | WO2008081844A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080154594A1 (en) * | 2006-12-26 | 2008-06-26 | Nobuyasu Itoh | Method for segmenting utterances by using partner's response |
US10134425B1 (en) * | 2015-06-29 | 2018-11-20 | Amazon Technologies, Inc. | Direction-based speech endpointing |
US11886170B2 (en) | 2019-03-15 | 2024-01-30 | Omron Corporation | Control system, setting device and setting program |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100017486A1 (en) * | 2008-07-16 | 2010-01-21 | Fujitsu Limited | System analyzing program, system analyzing apparatus, and system analyzing method |
KR101010911B1 (en) * | 2008-12-31 | 2011-01-26 | 엔에이치엔(주) | Method for sending and receiving message in message network system |
KR20120060655A (en) * | 2010-12-02 | 2012-06-12 | 한국전자통신연구원 | Routing Method And Apparatus For Detecting Server Attacking And Network Using Method Thereof |
CN104219105A (en) * | 2013-05-31 | 2014-12-17 | 英业达科技有限公司 | Error notification device and method |
US20150006730A1 (en) * | 2013-06-27 | 2015-01-01 | Sap Ag | Enabling multi-tenant virtual servers in a cloud system |
JP2015118685A (en) * | 2013-11-12 | 2015-06-25 | 株式会社リコー | Information processing system, information processing method, and program |
US9329937B1 (en) * | 2013-12-31 | 2016-05-03 | Google Inc. | High availability architecture |
JP2015215827A (en) * | 2014-05-13 | 2015-12-03 | 富士通株式会社 | Transmission order determination program, transmission order determination device and transmission order determination method |
CN107291558B (en) * | 2016-03-30 | 2020-11-24 | 阿里巴巴集团控股有限公司 | Application program interface deadlock monitoring method and device |
TWI691852B (en) | 2018-07-09 | 2020-04-21 | 國立中央大學 | Error detection device and error detection method for detecting failure of hierarchical system, computer-readable recording medium and computer program product |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5761637A (en) * | 1994-08-09 | 1998-06-02 | Kabushiki Kaisha Toshiba | Dialogue-sound processing apparatus and method |
US5806021A (en) * | 1995-10-30 | 1998-09-08 | International Business Machines Corporation | Automatic segmentation of continuous text using statistical approaches |
US20020032564A1 (en) * | 2000-04-19 | 2002-03-14 | Farzad Ehsani | Phrase-based dialogue modeling with particular application to creating a recognition grammar for a voice-controlled user interface |
US20020184373A1 (en) * | 2000-11-01 | 2002-12-05 | International Business Machines Corporation | Conversational networking via transport, coding and control conversational protocols |
US6496799B1 (en) * | 1999-12-22 | 2002-12-17 | International Business Machines Corporation | End-of-utterance determination for voice processing |
US6694055B2 (en) * | 1998-07-15 | 2004-02-17 | Microsoft Corporation | Proper name identification in chinese |
US20040193400A1 (en) * | 2003-03-24 | 2004-09-30 | Mcdonald David D. | Method and system for producing cohesive phrases from fixed phrases in a natural language system |
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
US7076430B1 (en) * | 2002-05-16 | 2006-07-11 | At&T Corp. | System and method of providing conversational visual prosody for talking heads |
US20060287847A1 (en) * | 2005-06-21 | 2006-12-21 | Microsoft Corporation | Association-based bilingual word alignment |
US7177810B2 (en) * | 2001-04-10 | 2007-02-13 | Sri International | Method and apparatus for performing prosody-based endpointing of a speech signal |
US20070067172A1 (en) * | 2005-09-22 | 2007-03-22 | Minkyu Lee | Method and apparatus for performing conversational opinion tests using an automated agent |
US20070071206A1 (en) * | 2005-06-24 | 2007-03-29 | Gainsboro Jay L | Multi-party conversation analyzer & logger |
US7313526B2 (en) * | 2001-09-05 | 2007-12-25 | Voice Signal Technologies, Inc. | Speech recognition using selectable recognition modes |
US7373300B1 (en) * | 2002-12-18 | 2008-05-13 | At&T Corp. | System and method of providing a spoken dialog interface to a website |
US7493257B2 (en) * | 2003-08-06 | 2009-02-17 | Samsung Electronics Co., Ltd. | Method and apparatus handling speech recognition errors in spoken dialogue systems |
US7493251B2 (en) * | 2003-05-30 | 2009-02-17 | Microsoft Corporation | Using source-channel models for word segmentation |
US7567902B2 (en) * | 2002-09-18 | 2009-07-28 | Nuance Communications, Inc. | Generating speech recognition grammars from a large corpus of data |
US7756709B2 (en) * | 2004-02-02 | 2010-07-13 | Applied Voice & Speech Technologies, Inc. | Detection of voice inactivity within a sound stream |
US7801838B2 (en) * | 2002-07-03 | 2010-09-21 | Ramp Holdings, Inc. | Multimedia recognition system comprising a plurality of indexers configured to receive and analyze multimedia data based on training data and user augmentation relating to one or more of a plurality of generated documents |
US7818174B1 (en) * | 2003-01-16 | 2010-10-19 | Comverse, Inc. | Speech-recognition grammar analysis |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4817092A (en) | 1987-10-05 | 1989-03-28 | International Business Machines | Threshold alarms for processing errors in a multiplex communications system |
US5968122A (en) * | 1997-03-31 | 1999-10-19 | Alcatel Alsthom Compagnie Generale D'electricite | Method for propagating between views of connection object status in network |
CN1439123A (en) | 2000-05-10 | 2003-08-27 | 泰克林克国际娱乐有限公司 | Security system for high level transactions between devices |
JP2001356972A (en) * | 2000-06-15 | 2001-12-26 | Fast Net Kk | Network monitoring system and method |
US20020075304A1 (en) * | 2000-12-18 | 2002-06-20 | Nortel Networks Limited | Method and system for supporting communications within a virtual team environment |
US20020075306A1 (en) * | 2000-12-18 | 2002-06-20 | Christopher Thompson | Method and system for initiating communications with dispersed team members from within a virtual team environment using personal identifiers |
US7099912B2 (en) * | 2001-04-24 | 2006-08-29 | Hitachi, Ltd. | Integrated service management system |
JP2003186833A (en) * | 2001-12-20 | 2003-07-04 | Hitachi Ltd | Responsiveness measurement evaluation device and distributed computer system using this device |
JP3969089B2 (en) * | 2001-12-25 | 2007-08-29 | 株式会社日立製作所 | Hierarchical server system |
US6996583B2 (en) | 2002-07-01 | 2006-02-07 | International Business Machines Corporation | Real-time database update transaction with disconnected relational database clients |
JP4516306B2 (en) * | 2003-11-28 | 2010-08-04 | 株式会社日立製作所 | How to collect storage network performance information |
JP4855655B2 (en) * | 2004-06-15 | 2012-01-18 | 株式会社ソニー・コンピュータエンタテインメント | Processing management apparatus, computer system, distributed processing method, and computer program |
-
2007
- 2007-12-03 TW TW096145883A patent/TW200841189A/en unknown
- 2007-12-26 JP JP2008552140A patent/JP4866429B2/en not_active Expired - Fee Related
- 2007-12-26 WO PCT/JP2007/075035 patent/WO2008081844A1/en active Application Filing
- 2007-12-26 KR KR1020097011538A patent/KR101033447B1/en not_active IP Right Cessation
- 2007-12-26 CN CN2007800483151A patent/CN101568905B/en not_active Expired - Fee Related
- 2007-12-27 US US11/964,858 patent/US20080215325A1/en not_active Abandoned
-
2011
- 2011-10-03 US US13/252,009 patent/US9128836B2/en not_active Expired - Fee Related
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5761637A (en) * | 1994-08-09 | 1998-06-02 | Kabushiki Kaisha Toshiba | Dialogue-sound processing apparatus and method |
US5806021A (en) * | 1995-10-30 | 1998-09-08 | International Business Machines Corporation | Automatic segmentation of continuous text using statistical approaches |
US6694055B2 (en) * | 1998-07-15 | 2004-02-17 | Microsoft Corporation | Proper name identification in chinese |
US20040199375A1 (en) * | 1999-05-28 | 2004-10-07 | Farzad Ehsani | Phrase-based dialogue modeling with particular application to creating a recognition grammar for a voice-controlled user interface |
US6496799B1 (en) * | 1999-12-22 | 2002-12-17 | International Business Machines Corporation | End-of-utterance determination for voice processing |
US20020032564A1 (en) * | 2000-04-19 | 2002-03-14 | Farzad Ehsani | Phrase-based dialogue modeling with particular application to creating a recognition grammar for a voice-controlled user interface |
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
US20020184373A1 (en) * | 2000-11-01 | 2002-12-05 | International Business Machines Corporation | Conversational networking via transport, coding and control conversational protocols |
US7177810B2 (en) * | 2001-04-10 | 2007-02-13 | Sri International | Method and apparatus for performing prosody-based endpointing of a speech signal |
US7313526B2 (en) * | 2001-09-05 | 2007-12-25 | Voice Signal Technologies, Inc. | Speech recognition using selectable recognition modes |
US7076430B1 (en) * | 2002-05-16 | 2006-07-11 | At&T Corp. | System and method of providing conversational visual prosody for talking heads |
US7801838B2 (en) * | 2002-07-03 | 2010-09-21 | Ramp Holdings, Inc. | Multimedia recognition system comprising a plurality of indexers configured to receive and analyze multimedia data based on training data and user augmentation relating to one or more of a plurality of generated documents |
US7567902B2 (en) * | 2002-09-18 | 2009-07-28 | Nuance Communications, Inc. | Generating speech recognition grammars from a large corpus of data |
US7373300B1 (en) * | 2002-12-18 | 2008-05-13 | At&T Corp. | System and method of providing a spoken dialog interface to a website |
US7818174B1 (en) * | 2003-01-16 | 2010-10-19 | Comverse, Inc. | Speech-recognition grammar analysis |
US20040193400A1 (en) * | 2003-03-24 | 2004-09-30 | Mcdonald David D. | Method and system for producing cohesive phrases from fixed phrases in a natural language system |
US7493251B2 (en) * | 2003-05-30 | 2009-02-17 | Microsoft Corporation | Using source-channel models for word segmentation |
US7493257B2 (en) * | 2003-08-06 | 2009-02-17 | Samsung Electronics Co., Ltd. | Method and apparatus handling speech recognition errors in spoken dialogue systems |
US7756709B2 (en) * | 2004-02-02 | 2010-07-13 | Applied Voice & Speech Technologies, Inc. | Detection of voice inactivity within a sound stream |
US20060287847A1 (en) * | 2005-06-21 | 2006-12-21 | Microsoft Corporation | Association-based bilingual word alignment |
US20070071206A1 (en) * | 2005-06-24 | 2007-03-29 | Gainsboro Jay L | Multi-party conversation analyzer & logger |
US20070067172A1 (en) * | 2005-09-22 | 2007-03-22 | Minkyu Lee | Method and apparatus for performing conversational opinion tests using an automated agent |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080154594A1 (en) * | 2006-12-26 | 2008-06-26 | Nobuyasu Itoh | Method for segmenting utterances by using partner's response |
US8793132B2 (en) * | 2006-12-26 | 2014-07-29 | Nuance Communications, Inc. | Method for segmenting utterances by using partner's response |
US10134425B1 (en) * | 2015-06-29 | 2018-11-20 | Amazon Technologies, Inc. | Direction-based speech endpointing |
US11886170B2 (en) | 2019-03-15 | 2024-01-30 | Omron Corporation | Control system, setting device and setting program |
Also Published As
Publication number | Publication date |
---|---|
CN101568905B (en) | 2011-10-12 |
US9128836B2 (en) | 2015-09-08 |
KR101033447B1 (en) | 2011-05-09 |
KR20090102747A (en) | 2009-09-30 |
JPWO2008081844A1 (en) | 2010-04-30 |
CN101568905A (en) | 2009-10-28 |
US20120023366A1 (en) | 2012-01-26 |
JP4866429B2 (en) | 2012-02-01 |
WO2008081844A1 (en) | 2008-07-10 |
TW200841189A (en) | 2008-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8793132B2 (en) | Method for segmenting utterances by using partner's response | |
US20080215325A1 (en) | Technique for accurately detecting system failure | |
US6839667B2 (en) | Method of speech recognition by presenting N-best word candidates | |
US8972243B1 (en) | Parse information encoding in a finite state transducer | |
JP4657736B2 (en) | System and method for automatic speech recognition learning using user correction | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
US8886534B2 (en) | Speech recognition apparatus, speech recognition method, and speech recognition robot | |
US9916826B1 (en) | Targeted detection of regions in speech processing data streams | |
US20070239455A1 (en) | Method and system for managing pronunciation dictionaries in a speech application | |
US20110054901A1 (en) | Method and apparatus for aligning texts | |
US9484019B2 (en) | System and method for discriminative pronunciation modeling for voice search | |
US20220383862A1 (en) | Cross-lingual speech recognition | |
US20140337024A1 (en) | Method and system for speech command detection, and information processing system | |
US20200184958A1 (en) | System and method for detection and correction of incorrectly pronounced words | |
US11211065B2 (en) | System and method for automatic filtering of test utterance mismatches in automatic speech recognition systems | |
US20020123893A1 (en) | Processing speech recognition errors in an embedded speech recognition system | |
WO2006083020A1 (en) | Audio recognition system for generating response audio by using audio data extracted | |
US6963834B2 (en) | Method of speech recognition using empirically determined word candidates | |
US20170270923A1 (en) | Voice processing device and voice processing method | |
JP2010078877A (en) | Speech recognition device, speech recognition method, and speech recognition program | |
JP2000029492A (en) | Speech interpretation apparatus, speech interpretation method, and speech recognition apparatus | |
KR102069697B1 (en) | Apparatus and method for automatic interpretation | |
KR20130050132A (en) | Voice recognition apparatus and terminal device for detecting misprononced phoneme, and method for training acoustic model | |
KR102299269B1 (en) | Method and apparatus for building voice database by aligning voice and script | |
US6772116B2 (en) | Method of decoding telegraphic speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HORII, HIROSHI;TAI, HIDEKI;YAMAMOTO, GAKU;REEL/FRAME:020875/0182 Effective date: 20080107 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |