CA1181525A - Pattern recognition method - Google Patents

Pattern recognition method

Info

Publication number
CA1181525A
CA1181525A CA000386244A CA386244A CA1181525A CA 1181525 A CA1181525 A CA 1181525A CA 000386244 A CA000386244 A CA 000386244A CA 386244 A CA386244 A CA 386244A CA 1181525 A CA1181525 A CA 1181525A
Authority
CA
Canada
Prior art keywords
candidates
pattern
value
input
inference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000386244A
Other languages
French (fr)
Inventor
Akira Ichikawa
Hiroko Matsuzaka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP55129327A external-priority patent/JPS5754999A/en
Priority claimed from JP56076564A external-priority patent/JPS57191786A/en
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Application granted granted Critical
Publication of CA1181525A publication Critical patent/CA1181525A/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data

Abstract

ABSTRACT OF THE DISCLOSURE
The present invention relates to a pattern recognition method for precisely recognizing phonemes, i.e., units of language. Two or more candidates that are likely to be input patterns are selected based upon the identified values when the input patterns and the standard patterns are checked up, and then a sole candidate that is most probable as the input pattern is selected not only relying upon the identified values of the selected candidates but also relying upon relations of the individual candidates and other candidates.

Description

PATTERN RECOGNITION METHOD
Background of the Invention Field of the Invention The present invention relates to a pattern recogni-tion method and, more particularly, to an improved pattern recognition method which precisely recognizes phonemes that corresond to signs of a language, that constitute confusing letters or sounds.
Description of the Prior Art In conventional pattern reconigition methods, such as a method for recognizing letters and voices, an input pattern and a standard pattern are compared, and whereby it is determined that a pattern having a category name of the standard pattern having an optimum degree of identi-fication is introduced.
In recognizing the letters, when, for example, a letter "~ " (large) is introduced, the check-up can be generally performed well with respect to " ~" (doy) or "~" (thick), in addition to a standard pattern of the letter " ~" ~large). In recognizing the voices, when, for example, the sound /t/ is introduced, the check-up can be usually performed well with respect to the same voiceless stop consonants such as /p/ or /k/ or with respect to /d/, /z/, or /s/ having the same place of articulation. Therefore, there is a great probability for developing erroneous recognition among such similar patterns, and the ability o recognition is decreased.
In recognizing phonemes, for example, in recognizing the voice produced by the physical phenomenon such as vibration in the vocal or~ans, the phonemes which constitute ~he voice produced under limited physical conditions such as length of the vocal organs, may be greatly affected by the preceding or succeeding phoneme and the speed of speech.
Therefore, it is very difficult to precisely recognize a phoneme.
In order to overcome the above difficulty, a method was once proposed, according to which a spoken word con-taining deformed phonemes was checked up as a practical recognition unit with a standard pattern.
According to the above method, howeer, it was nec-essary to prepare standard patterns of such large units as spoken words consisting of a combination of phonemes and, hence, it was necessary to store in the memory the standard patterns related to spoken words that were to be recognized. Since a memory having a tremendous capacity was necessary, it was virtually impossible to construct a voice recognizing apparatus which is capable of re-cognizing any voices, e.g. as would be required for a so-called voice typewriter.
In order to recognize any voices, therefore, it becomes an essential requirement to perform the recognition on the phoneme level.
As mentioned above, however, the recognition on the phoneme level presents the following problems:
1) It becomes difficult to perform the recognition as the phoneme is deformed.
2) A phoneme has a length considerably shorter than that of a word, and confusion arises among dif-ferent phonemes.
3) The voice is continuously produced in the direc-tion of time, and it is necessary to cut out the phoneme as a sectional pattern from the continuous voice pattern. It is, however, very difficult to properly cut out the sectional patterns.
With respect to the above problem 3~, a system called continuous DP (dynamic programming) matching method has been proposed in order to continuously perform the match-ing of the introduced voice pattern with the standard pattern without the need of cutting the continuously produced voice pattern after each predetermined period of time, and the effectiveness of the continuous DP matching method has been confirmed ("Continuous Speech Recognition by Continuous DP Matching" by Ryuichi Oka, Technical Report of Acoustic Society of Japan, S78-20).
To cope with the problems l) and 2), on the other hand, methods have been proposed in order to:
i) ~ncrease the kinds of characteristic parameters so that the slightest differences among the phonemes can be detected;
ii) Prepare standard patterns to emphasize consonant portions of the phonemes; and iii) Improve the matching method so that it is less affected by the deformed phonemes.
None of the above methods, however ~ has produced satisf actory results .
Summary of the Invention The object of the present invention is to provide a pattern recognition method which is capable of properly recognizing even confusing pat~erns based upon the above-mention~d facts, i.e., to provide a pattern recognition method which eliminates the above mentioned problems (1) and (2) in recognizing phonemes in order to enhance the recognition factor of the voice patterns.
In order to accomplish the above object, according to the present invention, the standard pattern of the highest certainty obtained by the matching of an unknown pattern with the standard pattern, is decided by utilizing the matching results of other standard patterns inclusive of resembling patterns as recognition information, in order to reduce erroneous recognition and to increase the recognition factor.
In accordance with an aspect of the invention there is provided a pattern recognition method comprising; checking input patterns with standard patterns; selecting a plurality of candidates that are likely to be input patterns based upon identified values that represent the results of checking; and inferring an input pattern among the plurality of candidates that are likely to be input patterns based on a predetermined extracted criterion of inference relying upon the nature of the selected candidates and the commonness of the nature of each of the candidates with the other candidates.
- 4 -The principle of the present invention will be des-cribed below wlth reference to phoneme recognition ba~ed upon the pattern matchin~ method.
In general, phonemes are not quite irrelevant to each other, but there are predetermined rules among the phonemes~
~herefore, the phonemes can be classified into several groups depending upon tkeir common natures. According to the above classifications, the phonemes pertain to several groups depending upon the natures. According to the results of recogni-tion experim0nts conducted bg the inventors of the present inve~tion9 thz following facts were clarified:
a) A distance obtained by checXing up a phoneme group having common nature with the standard pattern is sm~ller than a dist~nce obtained b~ checking up a phoneme group without common nature with the standard pa~tern b) Since a phoneme has a small amount of information, even the slightest deformation causes the distance wnich is the result of chec~ing up to be greatly varied. ~here is7 however, a predetermined upper limit in the width of varlation, and the distance seldom varies in excess of the upper limit.
c) When the priority is given to the phonemes depend-ing upon their distances such that the phoneme having a minimum distance as a result of the checking is entitled to the first order in certaint~ the phonemes having higher order in certaint~ have, in man~ cases~ a common nature to the phonemes that pertain to the same category~ even when the order of phonemes pertaining to the categor~ same as the stand~rd pat'ern is reversed relative to the order of p~onemes that pertain to a different category.
Gonverse~, the phonemes without a common nature often have small orders in certainty.
Relying upon these facts, the fundamental principle of the present invention consists of classif~ing the phonemes having higher orders in certaint~ as determined by the check-ing up into a pluralit~ of group~ depending upon their common natures, and spscifying the phonemes that commonl~
per'ain to these groups as the input phonemes.
In this caseg it is possible to increase the precision of recognition depending upon whether the phonemes havin~
less co~monness to other phonemes are located at higher positions in certainty or notO
~n~t should be set and how as a common nature for classif~ing the phonemes will differ depending upon the characteristic parameters emplo~èd for the recognition and the language being discussed. ~owever, a relatively stable classification is realized ba~ed upon the following natures:
1) Place of articulation~

2) Manner of production.
However, the ma~ner of production of the so~nd of the ;g series of Japanese language may be either /g/ (voiced stop consonant) or /~/ ~nasal consonant)~ Therefore 7 the classification based upon the abo~e-mentioned nature is not satisfactory.
In concretel~ constructin~ an apparatus, therefore, the phonemes should be classified depending upon the nature which is determined based upon the lan~uage or characteristic par~meters.

~ri f Desc~i t;on of the Drawin s e _ - P ~
~ig~ 1 is a diagram showing an example of the results obtained by classifying the candidates of recognition depending upon their co~on natures;
Fig. 2 is a diagram illustrating quantities that represent ~imilarity between the phonemes in the input patterns and the phone~es in the standard patterns as well a~ correction quantities for the phonemes in th~ input patterns;
~ig. ~ is a diagram showlng an e~ample of result~ of recognition by the first and second methods of the present invention;
Fig. 4 is a bloc~ diagram showing the principle of a pattern recognition apparatus according to a third method of the present invention;

~ ig. 5 is a diagram showing an example of average similarity between an input pattern (i) and a standard pattern (~);
Fig. 6 is a block diagram of a voice recognition ~pparatus according to an em~odiment o~ the present inven-tion;
Fig. 7 is a diagram showing a flow chart for checkins phon~mes according to the first and second methods of the present invention; and Fig. 8 is a flow chart for chec~ing phonemes accord-ing to the third method of the present invention.

Description of the Preferred rmbodiments ~mbodime~ts OL the invention will be described below in detail with reference to concrete dataO
~irst, a registered unit of a standard pattern is set as a vowel - consona~t - vowel (a so-called YCV unit~.
This unit ~ how0-~erl need not be limited to the VCV unit provided it is lower tnan a level of linguistic signs of voices such as syllables and phonemes.
If a word (/atataka/) is fed to an input voice, there will exist the following distances ~. the first place to the sixth place as th~ results of checking up with various VCV's that are prepared as standard patterns for reco3nizing the second underlined consonant /t/.

/aka/ : 1.53 /ada/ : 1.54 /aza/ : 1.58 ~ /ata/ : 1454 ~ /apa/ : 1.65 /ssa/ : 1.72 J
From the above results, the consonant in the input voice according to a conventional method will be errone-ously recognized as /k/ of ~ which gives a minimum dis-tance. The pre~ent inve~tio~ is to provide a method whichprecludes the a~ove defect, and which extracts a first candidate /t~ as the corxect answer from /ata/ which is in the fource place from the vie~point of distance According to the results of recognition experiments conducted by -the inventors of the present invention, t~e distance in the VCV that may be a correct answer does not become greater than a minimum distance in all VCV's by mor~
than 0.2.~ when the sampling frequency of the input voice is 8 KH~, hamm ng window in the continuous non-linear matching (usually referred to as DP matching) is 20 msec., and the frame distance is 10 msec. In the above-mentioned example, based upon this result, VCV's (six distances C) to ~ in the relation (1)) serve as candidates of recogni-tion having distances smaller than, ~ 1.53 + 0.3 = 1.83 s which is not greater1 b~ more than ~O~ than a minimum distance 1.53 (distance ~ in the relation (1)).
According to the first ~ethod of the present inYen-tion, consonants (including consonant /t/ of correct ans-wer) in the six VCY's extracted as candidates of recogni-tion are exanined for their commonness.
Therefore 9 the following facts can be unders~ood, The /k/ ænd /p~ which are voiceless stop consonants, ~re in agreement with each other in their manner of production, and pertai~ to the same group, The /d/, /~/ and /s/ have a point of articulation at the tip of tongue, and are in agreeme~t with each other in regard to -their place of articulation, asd pert~in to the same group.

Fig, 1 snows six conson~nts which are candidates from the viewpoint of the m~n~er of production a~d the place of articulation, consonants which can be classified into the same group, ar.d the 'otal number (N) in each group.
According tO ~ig. 1, there are the greatest number of consonants that can be classified into the same group as the consonant /t/ of the correct answer, There are two conso-nants from the viewpoint of the manner of production, and three conson~nts from the viewpoint of the place of arti-culation. The total number N inclusive of /t/ is 6.
Therefore, if the voice which is introduced is inferred , ~

s~

with the magnitude of N as a criterion for inference, it is possible to obtain a correctl~ recognized result~
Next, in order to enhance the precision of recogni-tion, new dista~ces reflecting the classified results of Fig. 1 are found from ~he distances that are obtained b~
the checking up, and voices that are introduced are in-ferred wi~h the thus found distances as cri-teria for in-ference.
Referring to the relation (1), if a distance of the i-th order is denoted b~ di~ a minimal ~alue among dl to d6 iS denoted b~ ~min (1.53 of ~aka/)~ the number of con-sonants o~ the i-th order that pertain to the same group of ~ig. 1 by Ni, and distances of VCV's corresponding to ~i consonants b~ dij (j = 1, 2 --- Ni)(in the case of /k/~
for example7 1.53 of dll = /aka/, 1.64 of dl2 = /ata/3 and 1.65 of dl3 = /apa/ when i = 1 and Nl = 3), the following ne~ distance dl' c~n be defined responsive to the dis-tance of the i-th o.-der of the relation (1)~
dl ~1 2-W3 ~2) Here, wl denotes a weighing quantity which represents increased result of recognition with the increase in the number of consonants that pertain to the same groupO
For instance, wl = l/Ni (3) ~5 S~mbol w2 denotes a weighin~ quantlty which represents ~ 3~ ~

increased result of recognition with the decrease in the distances that are results of check ups. For instance~
W2 = 1 ~ di ~ dmin Symbol W3 denotes a weighing quantity which repre-sents increased result of recognition with the decreaseof distances that are results o. check ups relative to VCV's that pertai~ to the same group. ~or instance~
1 N;
w = ~ d The distance di' ~i = 1, 2, --- 6) of the equation (2) is calculated using weighing quantities wl to W3 given b~ the equations (3) to (5)~ and are indicated as follows in the order corresponding ~o ~ to ~ of the equation (1).
1 /aka/ : 0.54 2 /ada/ : 0.41 3 /aza~ : 0.4, \ ~ (6) 4 /ata/ : 0~30 /apa/ : 0.60 /asa/ : 0.42 ~he distance d4' corresponding to /ata/ that serves as a correct recognition result assumes a minimal value 0. 30. This verifies the effectiveness of the first method of the present invention.
According to the results of a recognition experiment 25 conducted by the inventors of the present invention, the recognition factor of 95~ can be achieved by using the distance di' of the present invention compared with the recognition factor of 7%~ of the conventional methodO
In the above description, it was presumed that the number of VCV's pertaining to the same group is nearly equal in all of the V~V's. Some VCV's, however, may pertain to the same group in reduced numbers.
With reyard to such VCV's, the weight (wl of the equation (3)~ based on the number of VCVIs pertaining to the group is modified and is balanced, or the modification is effected depending upon whether there is any candidate having a different nature among those classified into the same group as candidates of recognition. As for the candidate having a different nature, the weighing quantity corresponding to the equations (3) to ~5) and the distance di" corresponding to di' of the equation (2) are found depending upon the nature of the candidate, and the modi-fication is effected depending upon the ratio di'/di".
If now the likelihoodration is used, the VCV close to the average spectral characteristics tends to appear as a candidate of recognition for various VCV's and also loses the likelihoodration value correspondingly. However, since the VCV having a great deviation feature appears as a candidate only for specific groups, it is possible to modify the distance di beforehand by utilizing the above-mentioned nature~

.. ,~, The above description has dealt with the method in which the degree of co~monness is expressed in two steps, i.e~ "l" (common) or l!OIt (not common)~ and the consonant /k/ of ~ig~ 1 has COmmOllIle9S to consonants /t/ and /p/
5 in regard to the m~nner of production and, hence, has a similarity degree 1, and has no commonness to other conso-nants /d/, /z/ or /s/ in regard to either the m~nner of production or the place of articulation andl hence~ has a similarity deg~ee 0. In other words~ the above description lO has dealt with the ~ethod which equally handles the ob-~ects of recognition that pertain to the same group rely-ing upon the common nature. Below is mentioned a second method according to the present i~vention, in which the common nature is expressed by any numerical value between O and l depending upon the degree of commonness to fairly evaluate the commonness among the phonemes, and to correct the deviation in the number of similar phonemes.
~irst, the similarity degrees PIJ between the phonemes I in the input voices that are to be recognized and the phonemes J in the standard patterns, are found and are tabulatedO Th~ similarity degrees PIJ may be prepared rely-ing upon the quantities phonemically defined based on common terms of discriminated features~ or may be prepared utiliz-ing the results of checking in the apparatus for recogniz-ing the voice.

_ 14 -Fig. 2 tabulates concrete examples of quantities corresponding to the similarity degree PIJ. In this case, when I = J is denoted by 1, values within a range of 0 to 1 are rounded to 0.0, 0.2, 0~4, 0.5, 0.8 or 1~0, and the results are multiplied by 100.
The similarity degree PIJ is a quantity which represents the degree of similarity between I and J.
Therefore, (1 - PIJ) can be regarded as a quantity which represents the degree of non-similarity between I and J.
The unknown voice which is introduced is now denoted by I, and is matched to the standard pattern J to utilize L distances that have the greatest similarities (in the following description, the similarity i5 defined by the distance dIJ, the smaller the distance dIJ the greater the similarity), i.e., to utilize L distances that lie inside a predetermined threshold value. If these distances are denoted as follows in the order of increasing quantities, dll' dl2' dl3~ dIL (7) the unknown voice I which is introduced will be specified as the one among 1 to L.
In inferring that the unknown voice is I based upon these quantities, the precision of inference can be increased through the following processing.
First if 5~

I J-l ~8) is calculated, SI becomes a quantity that indicates a de-gree which does not mean that the input voice is I
Moreover, the distance dI3 which is increased serves as a quantity tha~ indicates ~n increasing degree at which I is not J.
Therefore, if SI and dIJ are combined together to de-fine~
dI~ PIJ) dI~

it is considered that dI' becomes a quantity that indicates a degree at which the unXnown voice is not I. By using this quantity as a criterion of inference, it is possible to infer the voice to be Io when, dIo = Min[dl , d2'~ d~ dL~
~he distance dI' calculated according to the equation (9) corresponds ~3 di1 of the equation (2). ~hen the weighing qu~ntity w3 of the equation (2) is found, however, the distances~

dil 9 di2 7 di3 ~ di~i -which are the candidates are all equally treated as giYen by the equation (5).
According to the equation (9) 9 on t~e other hand~
the weighing (l - PIJ) is effected for all of the candi-date distances, 5~

dIl ~ dI2 ' dI3, 9 ~ dIL
depending upon the similarity between I and J (J = 19 2~
~ -, L) to find the dist~nce dI which is weight averagedO
Therefore, it is possible to find a distance which more faithfull~ reflects the distance relative to the standard pattern.
In the case of the input voice I having small num-ber of similar phonemes, the number of candidates L is small as ~iven by the equation (7), and the distance dI' is generall~ large~ ma~ing it difficult to perform correc~ recognition.
To correct this~ a correction coefficient CI for the distance dI' is introduc d to d6fine.
I I I I J=l IJ IJ (lO) and using the above quantity as a criterion of inference~
the -.oice is inferred to be Io based upon a relation, dTo Minrdl ~ d2 ~ d3'g ~ d~"~
~ or example~ the correction coefficient CI is calcu-lated as follows (numerical values are concretel~ show~
in the bottom row of ~ig. 2) based upon PIJ that corres-ponds to l/lO0 of the numerical values of Fig. 27 I J--l IJ (ll) where M denotes the total number of the standard patterns which are prepared.

In *he case of the phonemes having large CI valuesg there e~ist a lot of similar phonemes 9 and the distance dI' of the equation (9) tends to become smallO There-fore 9 use of the distance dI" corrected by CI enables the phonemes to be fairly recognized~
According to the recognition e~periments conducted by the in~entors of the p esent invention~ nine obJects were erroneously recognized among about lOO objects when the distance dIJ was employed. When the distance dI' was employed, four obJects were erroneously reco~nized.
Further, when the distance dI" was employed, only one object was erroneously reoo~nized.
Fig. 3 shows the results of recognition using the distances dI' and dI" for the four consonants of which the distance dIJ usually ranges from the first order to the fourth order from ~he smaller side in case the input voice to 'Qe recognized is a consonant /s/.
In Fig ~, the consonant ls correctly recognized as /s/ when d~l' is used~ even though it may be erroneously reco5nized as /t/ or /z/ when dIJ or dI' is used.
According to the above t-wo methods, part o~ the standard pattern repared based upon the checked-up values is selected as a candidate for recognition~ and an unkno~n pattern is inferred from the candidates relying u on a predetermined criterion of inference.

A third method of the present inventlon will be des-cr:ibed below, using a criterion of inference extracted from the combined information of input pattern and a plu-ralit~ of standard patterns, If an input pattern is denoted by i 5 a standard pa-ttern by ~, a degree of similarity corresponding to a checked-~p value of the input pattern i and the standard pattern j b~ di j~ the appearing probability of the input pattern i by p(i), the pro~ability in which the similarity degree between the input pattern i and the stand,ard pa-ttern j is di j by P(di ~ the probability in which the input pattern is i when the similarity degree is di j by p~i~di j), and the probability in which the input pattern i is checked with th~ st~ndard pa-ttern j is de-noted by p(i, j)9 the ckecking up of the input pattern iwith the standard patt~n j indicates that the probability p(i¦ i~ j) in which the input pattern i comes into agree-ment with vhe standard pattern j, is given by p(i¦ i, j) = p(i).p(i7i)~p(d~ p(ildi j) -~ 12) According to the conventional method~ j is presumed to be equal to i, and the input pattern is specified by i which satisfieSt max p (i¦ i, j) = p(i)-p(i,i)-P(di i li~i) p(i ¦di i) ~____ (13) According to the third method of the present inven-tion, on the other hand~ the in~ut pattern is specified by i which ma~i~izes a relation, max ~ p(i¦ i~j) =
,~ = 1 N

max p(i)p(i,j)p~d~ gi)-p(i¦ di j) ----- (14) where N denotes the total num~er of standard patterns, using p(i ¦ i") as a criterion of inference~
3=1 The probability p(i) can be statistically determined from the distribution of patterns. For example, the pho-nemes of the Japanese Language can be recognized by uti-lizing the results of investigation concerning the fre-~uency of phonemes.
When all of the standard patterns and input patterns are checXed up, p(i, j) = l/N. ~he probability P(di j¦
15 i,j) and the probabilit~ p(i¦ di 3) can be determined by defining ths practical characteristic parameters and similarity degrees, and b~ observing the distribution of the data, correspondinglyO ~he distribution of dij differs depending upon the parameters and the similarity degree. ~nen i = 3, in particularq the distribution often becomes as~mmetrical with respect to an average value dij of dij. In many cases~ however~ the distribution is symmetrical and can be approximated by the normal dis-tribution. Therefore, it is virtuall~ convenient to nor-malize the di~tribution with a dispersion 6i 3 to treat it as a function of ~ = (di~ ~ Therefore, if P(d~ i ) P
is approxi~a-ted with the normal distribution liXe~

P(di ~ j)Dp(i¦ di ~ ~ e ~ (15) the value of the equation (15) increases with the decrease in ~ . I'herefore~ the ob~ect which takes the sum of the equation (14) may be limited to the number n of comb.-nations of i and 3 having a small value rrij (in this case~
the equation (14) is treated ~ith regard to values n smaller than the total number ~). T-~hen the likelihoodration or a square distance i5 to be used as a similarity degree, a value among patterns having small similarity undergoes great cnange e-ren ~or a slight change in the patterns, and becomes uns~.able. Due to this unstability factor, there-fore~ the value ~lj becomes great and an apparent value Sij becomes small. In such a case, the objects which assume the sum of the equation (14) are not simply limited to those having small value ~ij but the value dia itself is limited to those having increased certainty (or having small Jikelihoodration or distance). Even in this case; the equation (14) is executed for the output that corresponds s-~

( to n standard patterns having values smaller than the total number N. ~hereafter, the total number N includes the me&ni.ng of n of such 2 meaning.
Accordingl~, it is possible to specif~ the input s pattern using i which approximatel~ assumes~

1 N dij ~ dii min N ~ ----- (16) a=l l/ 6 ij instead of the equation ~14). ~urthermore, if aij =
l~(N ~ ), and the e~uation (16) is given by, miin al aij(di~ _- (17) there is no nsed of elfecti~g the division.

Discussed below is a modification method based upon the idea of a matching method according to the above-mentioned th-rd method utilizing the information consisting of a com-bination of i and a~ The equation (17) is modified as follows: N
a ~ N-l ~i aij (cO - d~
where w denotes the weight, and aij and cO denote constants.

Here~ aij is defined as follows:

aij = Ci; - CO _____ (19) with the average value of dia as cij (cij = dij)~ The con~tant cO i~ so determined that dij does not usuall~ be come greater than it when the input pattern i and the stan-dard pattern ~ have commonness with regard to some natureand that dij does not become smaller than it when the in-put pattern i and the standard pattern j do not have commo~
nness. If the constant cO is determined as mentioned above, aij(cO - diJ) in the equation (18) assumes a negative 10 value in most case~ when the input pattern i and the stan-dard pattern j have commonness in regard to some nature, and assumes a positive value in most case~ when there is no co~monne~s between i and j. Therefore~ the s~cond term of the equation (18), i.e., N-l w ~ aij (CO - dij ) works to correct the result dij of the j-th matching por-tion depending upon the degree o~ commonne~ to the result dij of other matching portions. In particular ca~e~ it i~ allowable to so s~t that aij = 0. In this case, opera~
tion for the correction term for the combination can be eliminated to reduce the quantity of operation~ When the phonemic commo~ness i8 very qmall9 the Yalue dij will often become unstable. ~or such combinations, therefore, the value aij should be set to 0 beforehand to obtain stable results. further, the value dij which is greater than a predetermined level will not be reliable~ Therefore, it is better not to use the term thereof~
Described below is a further specific illus~ration of the principLe of the third method when it is adapted for recognizing voices, particularly for recognizing phonemes in continuous voice.
Fig~ 4 is a block diagram of the apparatus for re-cognizing voice based upon the above-mentioned principle. Fig. 4 principally illustrates a matching portion which executes the operation of the equation (14) to illustrate the principle of the third method of the present invention, and shows ~he flow of signals.
The input voice 1 is converted into characteristic parameters through an analyzing circuit 2, and is sent to identifying circuits 3-1 to 3-~ for checking with standard pattern memories 4-1 to 4-N of each of the phonemes. Results 5-1 to 5-N of checking or identi-fication with the phonemes are sent to matching circuits
6-1 to 6-N. Utilizing the results 5-1 to 5-N of checking with the phonemes, matching circuits 6-1 to 6-N perform calculations corresponding to each of the terms o~ the equation (14), whereby results 7-1 to 7-N are sent to a discriminating circuit 8. The discriminating circuit 8 compares the results, discriminates the phoneme having the highest degree of certainty, and produces a signal 9.

A first system in the third method based upon the equation (14) is illustrated below.
Likelihoodration of the tenth order is used as the degree of similarity.
First, the registered unit of a standard pattern consists of vowel - consonant - vowel (a so-called VCV
unit). This unit need not be limited to the VCV unit provided it is lower than a level of linguis~ic signs of voices such as syllable or phoneme.
According to the results of recognition experiments conducted by the inventors of the present invention, a distance in the VCV that is a correct answer does not become greater than a minimum distance in all of the candidate VCV's by more than 0.2, when the sampling frequency of the input voice is 8 KHz, the Hamming window in a continuous non-linear matching ~usually called continuous DP matching) using the dynamic programming method is 20 msec, and the distance among the frames is 10 msec. Further, the distance seldom exceeds 2.0 in the VCV that serves as a correct answer.
When 2.0 is exceeded, the distance should be rejected as it stems from unstable inputs. Therefore, the dij which is not greater than those having the greatest certainty by more than 0.4 and which is smaller than 2.0, is used~ Below are described the results d produced by the identifying , .

( circuits ~-1 to 3-N for /k/ after the input voice ~Kagaku-hooteishiki/.
First place : /g/ 1.634 Second place ~ / 1.774 ~hird place : ~b~ 1.910 ~ourth place : /p/ 1.927 In the equation (17) 9 if a value dia is measured as shown in Fig. 5, and if the dispersion 6ij is presumed to be 1, then t First place : /k/ 0.847/4 Second place : /p/ 1.43~/4 Third place : /b/ 2.237/4 ~ourth place : ~g/ 3.067/4 Thus~ /k/ becomes the first place, Below is mentioned a modified method based on the equa~ion (-18) as a second embodiment o~ the ~hird method.
When, ~irst place ~ 34 Second place : /k/ 1.774 Tnird place : /b/ 1.910 Fourth place : /p/ 1.927 if C0 = 2.2, W = 1. 09 and Cij is given as shown in Fig.
5, dij' after being corrected becomes.
~irst place : /k/ 1.672 Second place : /g/ 1.8~9 ( Third place : /p/ 1.927 Fourth place: /b/ 1~997 and the correct answer /k/ taXes the first place.
Below is mentioned an apparatus for recognizing the voice according to the present invention with reference to the si~a~on when the ~oice is to be recognized, particu-larly when the phone~e in the continuous voice is to be recosnized~
Fig. 6 is a block dia~ram Cf an apparatus for recog~
nizing the ~oice according to an embodiment of the present invention, In Fig. 6, an input voice 61 passes through a low-pass filter (LPF) 62 for preventing aliasing noise, and is converted into digital signals through an analog-to-digital converter (ADC) 63. Then~ a conventional charac-teristic parameter analyzing ci~cuit ~ produces a frame data consi~tlng of a ~hort-~erm autocorrelation ~vi~ and a residual power PO as a characteristic parameter after every interval of one frame (for exampleg lO msec.).
Likelihoodra~ion which represents the similarity bet~een a series of frame data and a series of frame d~ta of standard patterns stored in a standard pattern memory 66, is calculated b~ a li~elihoodration calculating cir-cuit 65.
~ased upon the thus calcula~ed likelihoodration, an optimum identified value is processed by a conventional continuous DP matching circuit 67 via an intermediate result memory 58, thereby to calculate the distance~dIJ~.
The distance ~dIJ (J = 1, 2, -----)~ is fed to a phoneme identified value processing circuit 600 via a buffer 69 where the recognition processing is carried out according to the method of the present invention~
and a fi~al result 610 of the processing of phoneme re-cognition is produced.
Here, -the pho~eme identified value processing circuit 600 may be made up of an ordinaril~ used microprocessor.
When the first and second methods of the present inven-tion are to be carried out uslng the microprocessor, however~ portions surrounded b~ a dotted line are executed as shown in the flow chart of ~ig. 7. Further, when the third method of the present invention is to be performed, the processing is carried out a~ shown in a flow chart of Fig. 8.
~he foregoing description has employed li~elihoodra-tion as a scale for measuring the similarity. Therefore, the circuits subsequent to the continuous DP matching circuit 67 in Fig. 6 perform such a processing that the certainty increases with the decrease in the value. The same also holds true even when the distance is used as a scale for measuring the similari-t~.

s;~ -When the correlation is to be used~ howeYer9 the pro cessing must be carried out in a way that the certainty increases with the increase in the value. ~or example~
the reliability must be increased with the incre~se in the weighing quantities ~rl9 w2 and W3 in the equation (23.
~he present invention naturally includes these modifica-tions.
According to -the present invention as illustrated in the foregoing9 ihe voice such as phonemes can be stably 10 and precisely recognized on a level lo~er than a linguistic level of signs, presenting great effects.

Claims (10)

Claims:
1. A pattern recognition method comprising; checking input patterns with standard patterns; selecting a plurality of candidates that are likely to be input patterns based upon identified values that represent the results of checking; and inferring an input pattern among the plurality of candidates that are likely to be input patterns based on a predetermined extracted criterion of inference relying upon the nature of the selected candidates and the commonness of the nature of each of the candidates with the other candidates.
2. A pattern recognition method according to claim 1, wherein said criterion of inference is a number of candidates having a nature common to that of said selected candidates, and a candidate having the greatest number of said candidates having a nature common to that of said selected candidate is inferred as the input pattern.
3. A pattern recognition method according to claim 1, wherein said criterion of inference is a product of a value which corresponds to an inverse number of the candidates having a nature common to that of said selected candidates, a value corresponding to said identified value of each of the candidates, and an average value of said identified value in each of the candidates and in the candidates having a nature common to said each of the candidates.
4. A pattern recognition method according to claim 1, wherein said criterion of inference is a value corres-ponding to a weighed average value of a similarity degree and an identified value between said selected candidates and candidates having a nature common to said candidates.
5. A pattern recognition method according to claim 1, wherein said criterion of inference assumes a quantity given by p(i)?p(di,j/i,j)?p(i/di,j)?p(i,j) where p(i) denotes an appearing probability of the input pattern i (i - 1, 2, --- N3, P(di,j/i,j) denotes a probability in which a quantity corresponding to the similarity degree between the input pattern i and the standard pattern j (j = 1, 2, ---N) is di,j, p (i/di,j) denotes a probability in which the input pattern is i when the quantity corresponding to said similarity degree is di,j, and p (i,j) denotes a probability in which an input pattern i is checked with a standard pattern j.
6. A pattern recognition method for inferring an input pattern comprising the steps:
comparing an input pattern with a plurality of standard patterns;
selecting a plurality of candidates that are likely to be the input pattern based upon identified values that represent the results of the comparison of input pattern with the standard patterns; and inferring an input pattern from the plurality of candidates based upon a predetermined extracted criterion of inference for evaluating the selected plurality of candidates, the predetermined criterion of inference being different than the criteria for selecting the plurality of candidates, and utilizing at least one characteristic parameter of each of the selected plurality of candidates and the commonness of at least one characteristic parameter within each selected candidate and the other remaining selected candidates.
7. A pattern recognition method in accordance with claim 6, wherein said criterion of inference is determined for each of said selected candidates by calculating the number of candidates having a characteristic parameter common to each selected candidate and the input pattern is inferred by chosing the candidate having the greatest calculated number.
8. A pattern recognition method according to claim 6, wherein said criterion of inference is a product of a value which corresponds to an inverse number of the candidates having a characteristic parameter common to that of said selected candidates, a value corresponding to said identified value of each of the candidates, and an average value of said identified value in each of the candidates and in the candidates having a characteristic parameter common to said each of the candidates.
9. A pattern recognition method according to claim 6, wherein said criterion of inference is a value corres-ponding to a weighed average value of a similarity degree and an identified value between said selected candidates and candidates having a characteristic parameter common to said candidates.
10. A pattern recognition method according to claim 6, wherein said criterion of inference assumes a quantity given by p(i)?p(di,j/i,j)?p(i/di,j)?p(i,j) where p(i) denotes an appearing probability of the input pattern i (i - 1, 2, --- N), P(di,j/i,j) denotes a probability in which a quantity corresponding to the similarity degree between the input pattern i and the standard pattern j (j = 1, 2, ---N) is di,j, p (i/di,j) denotes a probability in which the input pattern is i when the quantity corresponding to said similarity degree is di,j, and p (i,j) denotes a probability in which an input pattern i is checked with a standard pattern j.
CA000386244A 1980-09-19 1981-09-18 Pattern recognition method Expired CA1181525A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP55129327A JPS5754999A (en) 1980-09-19 1980-09-19 Voice recognition system
JP129327/1980 1980-09-19
JP76564/1980 1981-05-22
JP56076564A JPS57191786A (en) 1981-05-22 1981-05-22 Pattern recognizing system

Publications (1)

Publication Number Publication Date
CA1181525A true CA1181525A (en) 1985-01-22

Family

ID=26417702

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000386244A Expired CA1181525A (en) 1980-09-19 1981-09-18 Pattern recognition method

Country Status (4)

Country Link
US (1) US4559604A (en)
AU (1) AU7529981A (en)
CA (1) CA1181525A (en)
GB (1) GB2085628B (en)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58130393A (en) * 1982-01-29 1983-08-03 株式会社東芝 Voice recognition equipment
JPS5997200A (en) * 1982-11-26 1984-06-04 株式会社日立製作所 Voice recognition system
JPS59178587A (en) * 1983-03-30 1984-10-09 Nec Corp Speaker confirming system
US4712242A (en) * 1983-04-13 1987-12-08 Texas Instruments Incorporated Speaker-independent word recognizer
US4712243A (en) * 1983-05-09 1987-12-08 Casio Computer Co., Ltd. Speech recognition apparatus
JPS59216284A (en) * 1983-05-23 1984-12-06 Matsushita Electric Ind Co Ltd Pattern recognizing device
US4860358A (en) * 1983-09-12 1989-08-22 American Telephone And Telegraph Company, At&T Bell Laboratories Speech recognition arrangement with preselection
US4852171A (en) * 1984-11-09 1989-07-25 Alcatel Usa Corp. Apparatus and method for speech recognition
US4718094A (en) * 1984-11-19 1988-01-05 International Business Machines Corp. Speech recognition system
GB2179483B (en) * 1985-08-20 1989-08-02 Nat Res Dev Apparatus and methods for analysing data arising from conditions which can be represented by finite state machines
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
US4827521A (en) * 1986-03-27 1989-05-02 International Business Machines Corporation Training of markov models used in a speech recognition system
US4843562A (en) * 1987-06-24 1989-06-27 Broadcast Data Systems Limited Partnership Broadcast information classification system and method
US4926488A (en) * 1987-07-09 1990-05-15 International Business Machines Corporation Normalization of speech by adaptive labelling
US5054074A (en) * 1989-03-02 1991-10-01 International Business Machines Corporation Optimized speech recognition system and method
DE69030869T2 (en) * 1989-12-29 1997-10-16 Canon Kk Image processing method for evaluating objects and device for quality inspection to carry out the method
EP0441176B1 (en) * 1990-01-25 1994-03-30 Mitsubishi Jidosha Kogyo Kabushiki Kaisha System for controlling the output power of motor vehicle
JPH04194999A (en) * 1990-11-27 1992-07-14 Sharp Corp Dynamic planning method using learning
US5440742A (en) * 1991-05-10 1995-08-08 Siemens Corporate Research, Inc. Two-neighborhood method for computing similarity between two groups of objects
US5428788A (en) * 1991-05-10 1995-06-27 Siemens Corporate Research, Inc. Feature ratio method for computing software similarity
US5485621A (en) * 1991-05-10 1996-01-16 Siemens Corporate Research, Inc. Interactive method of using a group similarity measure for providing a decision on which groups to combine
EP0513652A2 (en) * 1991-05-10 1992-11-19 Siemens Aktiengesellschaft Method for modelling similarity function using neural network
US5438676A (en) * 1991-05-10 1995-08-01 Siemens Corporate Research, Inc. Method for adapting a similarity function for identifying misclassified software objects
US5317741A (en) * 1991-05-10 1994-05-31 Siemens Corporate Research, Inc. Computer method for identifying a misclassified software object in a cluster of internally similar software objects
US5402520A (en) * 1992-03-06 1995-03-28 Schnitta; Bonnie S. Neural network method and apparatus for retrieving signals embedded in noise and analyzing the retrieved signals
EP0608656A3 (en) * 1993-01-29 1994-11-23 Ibm Method and apparatus for optical character recognition utilizing combinatorial hypothesis testing.
JP2673871B2 (en) * 1993-08-26 1997-11-05 日本アイ・ビー・エム株式会社 Method and device for pattern recognition by neural network
JP2692581B2 (en) * 1994-06-07 1997-12-17 日本電気株式会社 Acoustic category average value calculation device and adaptation device
US5657424A (en) * 1995-10-31 1997-08-12 Dictaphone Corporation Isolated word recognition using decision tree classifiers and time-indexed feature vectors
JP2002222083A (en) * 2001-01-29 2002-08-09 Fujitsu Ltd Device and method for instance storage
US6915258B2 (en) * 2001-04-02 2005-07-05 Thanassis Vasilios Kontonassios Method and apparatus for displaying and manipulating account information using the human voice
JP2007047575A (en) * 2005-08-11 2007-02-22 Canon Inc Pattern matching method and device therefor, and speech information retrieval system
KR100808775B1 (en) 2006-07-26 2008-03-07 한국정보통신대학교 산학협력단 System and method for speech recognition using Class-based histogram equalization
JP4867654B2 (en) * 2006-12-28 2012-02-01 日産自動車株式会社 Speech recognition apparatus and speech recognition method
US20110184723A1 (en) * 2010-01-25 2011-07-28 Microsoft Corporation Phonetic suggestion engine
US9348479B2 (en) 2011-12-08 2016-05-24 Microsoft Technology Licensing, Llc Sentiment aware user interface customization
US9378290B2 (en) 2011-12-20 2016-06-28 Microsoft Technology Licensing, Llc Scenario-adaptive input method editor
CN104428734A (en) 2012-06-25 2015-03-18 微软公司 Input method editor application platform
US8959109B2 (en) 2012-08-06 2015-02-17 Microsoft Corporation Business intelligent in-document suggestions
JP6122499B2 (en) 2012-08-30 2017-04-26 マイクロソフト テクノロジー ライセンシング,エルエルシー Feature-based candidate selection
CN105580004A (en) 2013-08-09 2016-05-11 微软技术许可有限责任公司 Input method editor providing language assistance

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB981154A (en) * 1961-03-20 1965-01-20 Nippon Telegraph & Telephone Improved phonetic typewriter system
GB1243969A (en) * 1967-11-15 1971-08-25 Emi Ltd Improvements relating to pattern recognition devices
US3679830A (en) * 1970-05-11 1972-07-25 Malcolm R Uffelman Cohesive zone boundary detector
FR2150174A5 (en) * 1971-08-18 1973-03-30 Dreyfus Jean
JPS5517988B2 (en) * 1974-06-05 1980-05-15
JPS5272504A (en) * 1975-12-15 1977-06-17 Fuji Xerox Co Ltd Device for recognizing word audio
JPS5529803A (en) * 1978-07-18 1980-03-03 Nippon Electric Co Continuous voice discriminating device
DE2844156A1 (en) * 1978-10-10 1980-04-24 Philips Patentverwaltung METHOD FOR VERIFYING A SPEAKER
US4394538A (en) * 1981-03-04 1983-07-19 Threshold Technology, Inc. Speech recognition system and method
US4400828A (en) * 1981-03-27 1983-08-23 Bell Telephone Laboratories, Incorporated Word recognizer

Also Published As

Publication number Publication date
US4559604A (en) 1985-12-17
AU7529981A (en) 1982-03-25
GB2085628A (en) 1982-04-28
GB2085628B (en) 1984-06-06

Similar Documents

Publication Publication Date Title
CA1181525A (en) Pattern recognition method
US6539352B1 (en) Subword-based speaker verification with multiple-classifier score fusion weight and threshold adaptation
US5033087A (en) Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system
US4624011A (en) Speech recognition system
US5025471A (en) Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns
US4910782A (en) Speaker verification system
US5133012A (en) Speech recognition system utilizing both a long-term strategic and a short-term strategic scoring operation in a transition network thereof
EP0109190A1 (en) Monosyllable recognition apparatus
US4937870A (en) Speech recognition arrangement
US5181256A (en) Pattern recognition device using a neural network
US4078154A (en) Voice recognition system using locus of centroid of vocal frequency spectra
EP1005019A2 (en) Segment-based similarity measurement method for speech recognition
EP0074769A1 (en) Recognition of speech or speech-like sounds using associative memory
Finan et al. Impostor cohort selection for score normalisation in speaker verification
US5159637A (en) Speech word recognizing apparatus using information indicative of the relative significance of speech features
US4924518A (en) Phoneme similarity calculating apparatus
Farrell et al. An analysis of data fusion methods for speaker verification
EP1414023A1 (en) Method for recognizing speech
Wutiwiwatchai et al. Text-dependent speaker identification using LPC and DTW for Thai language
EP1524650A1 (en) Confidence measure in a speech recognition system
CA2227679C (en) Speaker recognition device
Rosenberg et al. Speaker identification and verification combined with speaker independent word recognition
Yegnanarayana et al. A speaker verification system using prosodic features.
KR100316776B1 (en) Continuous digits recognition device and method thereof
Grenier et al. Speaker adaptation for phoneme recognition

Legal Events

Date Code Title Description
MKEX Expiry