CA2218196C

CA2218196C - Method and apparatus for automatically generating a speech recognition vocabulary from a white pages listing

Info

Publication number: CA2218196C
Application number: CA002218196A
Authority: CA
Inventors: Vishwa Gupta; Michael Sabourin
Original assignee: Nortel Networks Ltd; Nortel Networks Corp
Current assignee: Nortel Networks Ltd
Priority date: 1996-11-29
Filing date: 1997-10-14
Publication date: 2004-12-14
Anticipated expiration: 2017-10-14
Also published as: DE69715784D1; DE69715784T2; US5839107A; EP0845774B1; EP0845774A2; EP0845774A3; CA2218196A1; JPH10229449A

Abstract

The invention relates to a method and apparatus for automatically generating a speech recognition vocabulary for a speech recognition system from a listing that contains a number of entries, each entry containing a multi-word identification data that distinguishes that entry from other entries in the list. The method comprises the steps of creating for each entry in the listing a plurality of orthographies in the speech recognition vocabulary that are formed by combining selected words from the entry. The words combination is effected by applying a heuristics model that mimics the way users formulate requests to the automated directory assistance system. The method is particularly useful for generating speech recognition vocabularies for automated directory assistance systems.

Description

TITLE: METHOD AND APPARATUS FOR AUTOMATICALLY GENERATING
A SPEECH RECOGNITION VOCABULARY FROM A WHITE PAGES LISTING
FIELD OF THE INVENTION
This invention relates to a method and an apparatus for use in a speech recognition system. It is particularly applicable to a method and an apparatus for automatically providing desired information in response to spoken requests, as may be used to partially or fully automate telephone directory assistance functions.
BACKGROUND OF THE INVENTION
In addition to providing printed telephone directories, telephone companies provide telephone directory assistance services. Users of these services call predetermined telephone numbers and are connected to directory assistance operators. The operators access directory databases to locate the directory listings requested by the users, and release the telephone numbers of those listings to the users.
Because telephone companies handle a very large number of directory assistance calls per year, the associated labor costs are very significant. Consequently, telephone companies and telephone equipment manufacturers have devoted considerable effort to the development of systems which reduce the labor costs associated with providing directory assistance services.
In a typical directory assistance system the caller 1 is first prompted to provide listing information, in other 2 words to specify in what area resides the business or 3 individual whose telephone number he seeks. If valid 4 speech is detected, the speech recognition layer is invoked in an attempt to recognize the unknown utterance.
6 On a first pass search, a fast match algorithm is used to 7 select the top N orthography groups from a speech 8 recognition dictionary. In a second pass the individual 9 orthographies from the selected groups are re-scored using a more precise likelihoods. The top orthography in each of 11 the top two groups is then processed by a rejection 12 algorithm which evaluates if they are sufficiently 13 distinctive from one another so the top choice candidate 14 can be considered to be a valid recognition.
Usually the speech recognition dictionary that 16 contains the orthographies potentially recognizable by the 17 speech recognition layer on a basis of a spoken utterance 18 by a user is created from screened tokens. These are 19 actual call records in which are stored the spoken request by the user. This information allows to determine how 21 people verbally formulate requests in connection with a 22 certain entity whose telephone number is being sought.
23 For example, the business "The First Wall of Glass 24 Company" on Wilmington street, may be requested in a variety of different ways, such as "Wall of Glass", "First 26 Wall of Glass on Wilmington", First Wall of Glass Company"
27 and "Wall of Glass on Wilmington", among others. After 28 examining the different formulations associated with the 29 entity, orthographies are created, where each orthography corresponds to a particular request formulation. Thus, 31 for each entry in the white pages a number of 32 orthographies are produced that collectively form a speech 1 recognition vocabulary. In practice, not all possible 2 formulations are retained to avoid creating extremely 3 large speech recognition vocabulary that may not be able 4 to offer real-time performance. Only the formulations that occur the most often are retained.

7 Although the use of screened tokens allows to 8 construct a precise speech recognition vocabulary that 9 reflects well the manner in which spoken requests are formulated, this method is time consuming and very 11 expensive to put in practice. Indeed, a large number of 12 screened tokens are required to construct a useful 13 vocabulary, in the order of 50,000 to 100,000.

Thus, there exists a need in the industry to develop 16 automated methods for generating a speech recognition 17 vocabulary that at least partially reduces the reliance on 18 screened tokens to complete this task.

21 An obj ect of the present invention is to provide a 22 method for generating a speech recognition vocabulary from 23 a listing containing a plurality of entries.

Another object of the present invention is to provide 26 an apparatus for generating a speech recognition 27 vocabulary from a listing containing a plurality of 28 entries.

A further object of the invention is a computer 31 readable medium containing a program element that 32 instructs a computer to process a listing to generate a ' r speech recognition vocabulary.
The present inventors have made the unexpected discovery that a useful speech recognition vocabulary may be automatically created by applying to a listing a heuristics model that simulates the manner in which spoken requests can be made. Typically, the listing contains entries, names of individuals for example, that the speech recognition system can potentially identify based on a spoken utterance by the user. In a specific example, the listing may be a white pages list that is a source of information associating an entity name, such as an individual or a business with a telephone number or some pointer leading to a telephone number. Most preferably, the white pages also provide civic address information for each entity. In essence, the heuristics model observes the different words from which the entry in the entry in the listing is constructed and combines those words in a different manner to create orthographies that mimic the way a query of that particular entry is likely to be made.
According to one aspect the invention provides a method for generating a speech recognition dictionary for use in a speech recognition system, the method comprising the steps of: providing a machine readable medium containing a listing of a plurality of entity identifiers, each entity identifier including at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words; processing said machine readable medium by a computing device for generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each orthography being a representation.of a spoken utterance, each orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier;
transcribing said orthography set in the form of data elements forming a data structure capable of being processed by a speech recognition system characterized by an input for receiving a signal derived from a spoken utterance, said speech recognition system being capable of processing the signal and the data structure to select a data element corresponding to an orthography likely to match the spoken utterance and performing a determined action on the basis of the data element likely to match the spoken utterance selected by the speech recognition system; storing said data structure on a computer readable medium.
In a preferred embodiment, the above defined method is used to generate a speech recognition vocabulary for use in an automated directory assistance system. The list of entity identifiers is a white pages listing that is a database providing for each entry, information such as name and civic address. The particular heuristics model selected to generate the orthography set for each entry combines different words from the various database fields to produce individual orthographies having different levels of expansion, i.e., containing different informations. In a very specific example, one orthography may consist of the first word of the name field. Another orthography may consist of the full name of the entity. Yet, another orthography may be formed by associating the full name and some elements of the civic address, such as the street name.

1 In most instances, each orthography of a given set will 2 share at least one word with another orthography of the 3 set. This, however, is not a critical feature of the 4 invention as it is very well possible to develop S heuristics models that produce an orthography set where no 6 common word is shared between a certain pair of 7 orthographies.

9 The list containing the entity identifiers is in a computer readable format so it may be processed by a 11 computer programmed with the selected heuristics model to 12 generate the speech recognition vocabulary. The 13 particular format in which the various words forming the 14 entity identifiers are stored or represented in the computer readable medium is not critical to the invention.

17 In another embodiment of the invention the entity 18 identifier includes title information in addition to the 19 name and civic address data. The title information is used by the particular heuristics model to develop 21 orthographies that contain the title of the particular 22 entity. In a specific example, an entity identifier may 23 include the following information elements:
24 FvLL NAME STREET NL71~ER LOCALITY

Little Red Bottomo 987 Sunshine 26 Cars Beach 29 After applying model:
the following heuristics 31 Full Name 32 Full Name +
Street 1 Full Name + Locality 2 Full Name + Street + Locality 4 The set of orthographies will be as follows:
6 Little Red Cars 7 Little Red Cars on Bottomo 8 Little Red Cars in Sunshine Beach 9 Little Red Cars on Bottomo in Sunshine Beach 11 In a different example the entity identifier includes 12 the following information elements:

14 Bill Titus Smart 1234 Montreal Attorney 16 After applying the following heuristics model:

18 Full Name 19 Full Name + Street Full Name + Locality 21 Full Name + Street + Locality 22 Full Name + Title 23 Full Name + Title +
Street The set of orthographies will be as follows:

27 Bill Titus 28 Bill Titus on Smart 29 Bill Titus in Montreal Bill Titus on Smart in Montreal 31 Bill Titus Attorney 32 Bill Titus Attorney on Smart In the above examples it will be noted that the orthographies in each set share among them at least one common word, namely "Little", "Red" and "Cars" for the earlier example, while in the latter example "Bill" and "Titus" are common words.
The heuristics model used to generate the orthography sets may be simple or of a more complex nature.
For example, the model may be such as to generate a first orthography based on the first word in the entity identifier and a second orthography that is a combination of the first and second words of the identifier. For example, the entity identifier "Computer Associates company" will generate by applying this heuristics model the first orthography "Computer", a second orthography "Computer Associates", etc.
This model can be refined by ignoring certain words that may be meaningless by themselves. Words such as "First", "The", do not convey sufficient information when used alone. Thus, to avoid the introduction of orthographies unlikely to lead to correct automation no orthographies based solely on these individual words are used. In those circumstances the first orthography will comprise at least a pair of words. For example "The first machine industry" will generate the orthographies "First machine", "First machine industry" etc.
The invention also provides an apparatus for generating a speech recognition vocabulary for use in a speech recognition system, said apparatus comprising: first memory means for holding a listing of a plurality of entity identifiers, each entity identifier including at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words; a processor in operative relationship with said first memory means; a computer readable medium containing a program element, wherein the program element provides means for:
a) generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each said orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier; b) transcribing said orthography set in the form of data elements forming a data structure capable of being processed by a speech recognition system characterized by an input for receiving a signal derived from a spoken utterance, said speech recognition system being capable of processing the signal and the data structure to select a data element corresponding to an orthography likely to match the spoken utterance and performing a determined action on the basis of the data element likely to match the spoken utterance selected by the speech recognition's system.
The invention also provides a machine readable medium containing a program element for instructing a computer to generate a speech recognition vocabulary for use in a speech recognition system, said computer including:
first memory means for holding a listing of a plurality of entity identifiers, each entity identifier including at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words; a processor in operative relationship with said first memory means; the program element providing means for:
a) generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each said orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier; b) transcribing said orthography set in the form of data elements forming a data structure capable of being processed by a speech recognition system characterized by an input for receiving a signal derived from a spoken utterance, said speech recognition system being capable of processing the signal and the data structure to select a data element corresponding to an orthography likely to match the spoken utterance and performing a determined action on the basis of the data element likely to match the spoken utterance selected by the speech recognition system.
As embodied and broadly described herein the invention also provides a machine readable medium containing a speech recognition vocabulary generated by the above described method or apparatus.
In another aspect, the invention provides a speech recognition system having a memory which contains a speech recognition vocabulary representing a plurality of orthographies, said speech recognition vocabulary generated by: providing a first computer readable medium containing a listing of a plurality of entity identifiers wherein each entity identifier comprises at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words; generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each said orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier; storing said orthography set on a second computer readable medium in a format such that the orthographies of said orthography set are recognizable by a speech recognition system on a basis of a spoken utterance by a user.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates a white pages listing entry corresponding to a business organization;
Figure 2 is a general flow chart of the process for expanding abbreviation in the white pages listing;
Figure 3 is a functional block diagram of an apparatus for generating a speech recognition vocabulary from a white pages listing; and Figure 4 is a general flow chart of the process for generating the speech recognition vocabulary from a white pages listing.
l0a DESCRIPTION OF A PREFERRED EMBODIMENT
As an introductory comment it should be pointed out that the invention does not directly relate to the structure and operation of an automated directory assistance system. Rather, the invention is concerned with a method and apparatus for generating a speech recognition vocabulary that can be used in a speech recognition system, such as an automated directory assistance system from a listing of entities potentially recognizable or identifiable by the speech recognition system. For more information on the structure and detailed operation of an automated directory assistance system the reader may refer to the following documents:
U.S.
PATENTS
-Patent # Inventor 5,488,652 Gregory, J. Bielby et al.

4,164,025 Dubnowski et al.

4,751,737 Gerson al.
et 4,797,910 Daudelin 4,959,855 Daudelin 4,979,206 Padden al.
et 5,050,215 Nishimura 5,052,038 Shepard 5,091,947 Ariyoshi et al.

5,097,509 Lennig 5,127,055 Larkey 5,163,083 Dowden al.
et 5,181,237 Dowden 5,204,894 Darden 5,274,695 Green 17 U.S. PATENTS

18 Patent # Inventor 1 5,515,475 Gu to et al.

2 5,307,444 Tsuboka 4,751,736 Gupta et al.

4 5,226,044 Gupta et al.

$ 4,956,865 Lenning et al.

5,390,278 Gupta et al.

5,086,479 Takena a et al.

g O

12 Dynamic Adaptation 1989, IEEE International of Hidden Symposium on Circuits 13 Markov Model for and Systems, vol.2, May Robust Speech 1989 pp.1336-1339 14 Reco nition 1$ Dynamic Modification IBM Technical Disclosure of the Bulletin, vo1.27, No.7A, 16 Vocabulary of a Dec. 1984 Speech 17 Reco nition Machine 18 Adaptive AcquisitionGorin et Computer Speech and Language, of al. vol.5, No.2 19 Lan ua e, A r.1991, London , GB, .

2O Automated BilingualLenning IEEE Workshop on Interactive Directory et al, Voice Technology 21 Assistance Trial for Telecom Applications, In Bell Canada Piscataway, NJ.Oct.1992.

22 Unleashing The Labov and Telesis, Issue 97, 1993, Potential of Lennig, pp.23-27 23 Human-To-Machine 24 Communication 2$ An introduction Rabiner IEEE ASSP Magazine, Jan.
To Hidden and 1986, pp. 4-16 26 Markov Models Juan 2~ Putting Speech Lennig, Computer, published by IEEE
Recognition to Computer Society, 2g Work in The Tele vo1.23, No.8, Au . 1990 hone Network 29 Flexible VocabularyLennig IEEE Workshop on Interactive Recognition et al. Voice Technology 30 of Speech Over for Telecom Applications, The Telephone Piscataway, NJ, Oct.

O

1 Mobile Robot Control Nagata et al. pp.69-76, 1989 by a 2 Structural Hierarchical Neural 3 Network Large Vocabulary Steven IEEE Automatic Speech Recognition Continuous Young Workshop, S eech Rec nition: Se tember 16, 1995 a Review Putting Speech Matthew IEEE (August 1990) reprinted Recognition to Lennig from Computer 7 Work in the Tele hone Network g 9 The raw data input to the speech recognition dictionary builder is, as mentioned earlier, an electronic 11 version of the white pages.
The electronic white pages 12 provide detailed listing information, analogous to the 13 printed version of the white pages.
A
sample listing is 14 given below:

ANALYSIS
INSTITUTE
OF
COLORADO

1 ............Office g Locations 2~ .................5800,E,EIdridge,Av, DENVER...........3036220396 22 .................6169,S,Beacon,Wy, LITTLETON..........3032883963 24 .................8402"Galbraith"
WESTMINSTER........3030579821 26 .................200,W,Country Line,Rd, HIGHLANDS
RANCH..3034492001 2g .................2020"Wadsworth,Blvd, LAKEWOOD.......3039924286 ...........Business Office, 5800,E,EIdridge,Av, DENVER.....3036221423 32 ...........Analysis Lab, 5800,E,EIdridge,Av, DENVER.

34 ...........Day Or Night Call, DENVER............3036224455 36 Figure graphically illustrates the structure of 37 this business organization.
T
h a a a c t r o n i c representation of this sample listing is given in the following table:

1OM r1r1 CD M rl X17 61l0 N O W N uf7L(7 (1,' M ~ 07O N C N d' . O M ~ N c rl.-IC
, ~ N W I~01 N N N N

N (D LnC 01 N N N

l0N O ~ 01 t0L9 l0 M M M M M M M M

O O O O O O O O

M M M M M M M M

c r ~ ~ +u a~

a ~ ~ ro -~o m ~ o v ~.a U ' ~.a C

~ ro y~C ~
ro ~ o b ro a m m ~

roU ro c~ 3 W

a E p Q ~ ~ a w m m H
x w m cn ~ w w m H
W

O tD O ~ N O O ~A

m rl c O m m CC3 H M t9 m N N ~ ~7 C N U1 .

U
O +~'D

O N C ro O
U

~ N

G C ~ ~ ~ C C C
~

0 0 , o 0 0 ~

~ N ~ a a a~z a ro ~, ~, O U ro O .~p U

O w ro c '~ w a v Q r U U1 a U

w b, o a m .m > v a~

ro ~ C '~ s~
+~

.~

..i C ~ 7, U ~ m ro a~

,~ o Ca tn C

H

O rl N N N N N .~,-1rl O .~ N M c ~O l0 I~OJ 01 M M M M M M M M M M

M M M M M M M M M M

M M M M M M M M M M

M M M M M M M M M M

N M ~ ~1 ~O I~ 00 O~ O ~~ N c~ ~ v~

1 Each white pages caption set can be represented as a 2 "tree" structure, as shown in the above table: the top-3 line listing is the root of the tree, the sub-listings and 4 are nodes of the tree. The structure embedded the in white pages caption set specifies the topology the of 6 caption set tree.

8 It should be noted that the various fields for each 9 record are merely examples of the type information that of is available. The following table provides more a 11 complete, although not exhaustive list of the various 12 fields of information.

14 FzELn Exr~r~

1$ surname field <kubrick>

16 subse uent name <stanle >
field 17 rofessional title <doctor>

18 lines a assi <~r.>
ned to name 19 license, academic <PhD>
de rees business descri < master lumber>
tion 21 buildin number <16>

22 buildin number <N12->
refix 23 buildin number <-A>
ostfix 24 street name <armand bombardier>

street directional <north>
refix 26 street thorou <boulevard>
hfare a 27 street directional <east>
ostfix 28 Locali <saint lambert>

29 state or rovince <texas>

~ county < montereaie>

BUILDER

1 This section describes methods of building the speech 2 recognition lexicon. The lexicon is later phonemically 3 transcribed, mapped into a set of acoustic models, and a 4 speech recognition dictionary is created. Each lexical item, or phrase, attempts to mimic the manner in which 6 directory assistance queries are made. Phrases are 7 generated using heuristics. More specifically, heuristics 8 generate recognition phrases using the text contained in 9 the electronic white pages. However, before processing the white pages entries with heuristics to create the 11 speech recognition lexicon, the white pages data is pre-12 processed which corresponds to a "cleaning operation", 13 involving the expansion of abbreviations and the removal 14 of non-productive information.
16 The expansion of abbreviations is effected by using 17 a substitution table. The program which generates the 18 orthographies from the white pages listing observes a 19 number of fields in the listing for occurrences of specific letter combinations known to be abbreviations.

22 FIELD COMBINATION OF LETTERSSUBSTITUTED WORD(S~

23 Name agcy agency 24 Title atty attorney MD Doctor 26 Street Prefix S south 27 N north 28 E east 29 W west Locality Bouldr Boulder 22 FIELD COMBINATION OF LETTERSSUBSTITUTED WORD(S~

1 Mtl Montreal 4 This table provides only an example of possible abbreviations and the substituted words that the system 6 can use.
In practice, the expansion of the abbreviations 7 is not a complex operation because what the program is 8 looking for, namely the different abbreviations are 9 usually known in advance as they are established actuating to a standard practice of the telephone company.
Moreover, 11 the database fields where the abbreviations are likely to 12 be found are usually also known.
Thus, it suffices to scan 13 the white pages database looking for the all possible 14 abbreviations.
Once an abbreviation is located a substitution table is consulted to find the expansion 16 corresponding to the given abbreviation.
That expansion is 17 then inserted in place of the original abbreviation.
This 18 process is illustrated in Figure of the drawings.
The 19 program element responsible for the expansion of abbreviations searches at step possible abbreviations 21 that are known in advance in the fields of the database 22 where those abbreviations are likely to be found.
When an 23 abbreviation is located at step 12, a substitution table 24 is consulted at step to determine the substitution word.
The latter is then inserted at step in the 26 database.
The program next fetches the next record of the 27 database and the process repeated until all the records 28 have been examined.

The final step of the "cleaning operation"
consists 31 of removing extraneous information that is non-productive 1 in the sense it does not provide any additional 2 information about the particular entity. This process is 3 also effected by scanning the white pages listing database 4 and looking for particular letter combinations or key words. Once such letter combination or key word is 6 located, it is simply erased. Examples of what is 7 considered non-productive information is as follows: "Toll 8 free number", "24 hour service", "Day or night surface", 9 among any other possible letters or words that may be considered superfluous. This operation is effected by 11 using a program element of the type illustrated in Figure 12 2. Each field of the database is scanned to locate pre-13 determined words or phrases and when one of these words or 14 phrases is located it is erased.
16 The heuristics model used to generate the 17 orthographies of the speech recognition lexicon may vary 18 with the intended application. For example, the 19 heuristics for a simple listing can be as follows:
21 First word in name field 22 Full name field 23 Full name field and street name 24 Full name field street name less street thoroughfare type information etc.

27 For example, consider the following entity identifier:
ZS TYPE OF DATA DATA

29 ID Number 28724 Full Name ABC American Ship Building Company 31 Building Number 909 1 Street Wadsworth 2 Street Type BLVD

3 Telephone Number 5146766656 6 Application of the heuristics model (described above) 7 generates the following orthography set:

ABC
American Ship Building Company American Ship Building Company on Wadsworth Boulevard American Ship Building Company on Wadsworth 14 This orthography set includes four individual orthographies each one pointing toward the same telephone 16 number.
The first two orthographies contain words related 17 only to the name of the entity, while the last two 18 orthographies are a combination, in other words including 19 information relating to the name of the entity and to the civic address.
The word "ABC"
is common to all 21 orhtographies 23 Although this heuristics model is simple, it may 24 sometimes create orthographies that are not likely to match what the user is saying.
In some instances, the 26 first word alone in the name field of the entity name may 27 by itself be meaningless.
For example in the business 28 "First American ship building company", an orthography 29 containing solely the word "First"
may not be very useful because it is unlikely that the user will request the 31 telephone number of that company solely by saying "First".

32 Thus, the model can be refined in two possible ways.
One 1 is to identify those non-productive words in an entity 2 name and when a word in the pre-established group is 3 encountered it is not considered. In this example, the 4 program building the speech recognition vocabulary looks for "First" at the beginning of the name and if that word 6 is found it is ignored and only the second word in the 7 name is used to build the first orthography in the set.
8 The second possibility is to create orthographies that 9 include at least a pair of words from the name of the entity, namely the first and the second words. If this 11 heuristics model is applied to the example given above, an 12 orthography set including only three orthographies will be 13 generated, excluding the orthography "ABC."

The table below provides an example of a caption set 16 that is related to a hierarchical organizational structure 17 of the type illustrated in figure 1. The heuristics model 18 used to generate the orthography groups also takes into 19 account the rank number field in the white pages database that provides information as to whether the entry is at a 21 root level or constitutes a branch.

r r N

m r N

a O O O

O ' w ~ ~n N N N

N

U

C

E ~ N it O
c x v N

te U1 ~ v a .

H

W

w E p E

N

E
X

W
H

~4 N

H

W ~-Ir1 ~

O
m m C~

H c N

E

H

O

a G

r U U
Q' ~

c -ri~ri ~ ~

r6 W W
U ~

U W W
O O

' c a~ ~oro c ~o a ~ o w ~o a a 4 o m "1 U

~" U

Q

O .~ .-i.-t c W uor N N N N

r r r r m m m m N N N N

N M ~ ~n ~D t~ 00 2 The following heuristics model will generate the 3 orthography group below:

First two words in "Rank 0" name field 6 Full "Rank 0" name field 7 Full "Rank 0" name field and full "Rank 1" name field 8 Full "Rank 0" name field and full "Rank 1" street name 9 Full "Rank 0" name field and "Rank 1" street name less street thoroughfare type information etc.

12 First American 13 First American Ship Building Company 14 First American Ship Building Company-Accounts and Personnel 16 First American Ship Building Company-Wadsworth Boulevard 17 First American Ship Building Company-Wadsworth 19 Listings with title information are treated with different heuristics. The title field in the white pages 21 entry is used to store information relating to the 22 profession of the person specified in the name field.
23 Titles include orthopedic surgeons, judges, attorneys, 24 senators, doctors, and dentists. Titles are interpreted using a substitution table. For example, the title "MD OB-26 GYN & INFERTILITY" is interpreted as "Doctor" . A common 27 variant of the usage of title information has been 28 observed: the title can occur in the final position of the 29 phrase. For example, the title at initial position "Dr."
becomes "MD" at final position, and phrase initial title 31 "Attorney" becomes phrase final "Attorney at Law". As an 32 example, the following heuristics may be applied to titled 33 listings:

1 TITLE+SUBSEQUENT NAME FILED+SURNAME FIELD
2 TITLE+SURNAME FIELD
3 SUBSEQUENT NAME FIELD+SURNAME FIELD

S SUBSEQUENT NAME FIELD+SURNAME FIELD+TITLE

7 For example, the listing <Surname Field: Trine>, 8 <Subsequent Name Field: William A> and <Title: Atty>" will 9 generate the following set of orthographies all pointing toward the same telephone number:

12 Attorney William A Trine 13 Attorney Trine 14 William A Trine Trine 16 William A Trine, Attorney at Law 18 The above described heuristics models are merely 19 examples of a large number of different possibilities. In general, the heuristics model chosen for the particular 21 application is designed to mimic the way people formulate 22 requests. Simple heuristics models have the advantage of 23 generating a lesser number of orthographies. On the down 24 side, however, they may lead to a less than perfect automation rate because the automated telephone directory 26 assistance system may not be able to recognize all 27 reasonably formulated requests. More refined heuristics 28 models generate a larger number of orthographies that 29 cover more possibilities in terms of request formulations, however these models significantly increase the size of 31 the speech recognition vocabulary which is an important 32 element if one desires to provide real time performance.
33 Thus, the specific heuristics model used to generate the 1 speech recognition vocabulary will need to be adapted to 2 the particular application and may greatly vary from one 3 case to another. In some instances, a single heuristics 4 model may be sufficient for the entire vocabulary generation. In other applications, a combinations of 6 heuristics models may need to be used in dependence of the 7 type of white pages entries to be processed. For example, 8 if the white pages listings contain both single line 9 entries and caption sets it could be advantageous to use different heuristics models applicable to each type of 11 entry.

13 The apparatus for generating the speech recognition 14 vocabulary is illustrated in Figure 3. The apparatus includes a processor 18 in operative relationship with a 16 memory having three segments, namely a first segment 20 17 containing program instructions, a second segment 22 18 containing white pages listing, and a third segment 24 19 containing the speech recognition vocabulary. The flow chart illustrating the program operation is shown in 21 Figure 4. At step 26 a database record is fetched. At 22 step 28 the desired heuristics model invoked and at step 23 30 the set of orthographies generated. The set of 24 orthographies are then placed in the third memory segment 24. The speech recognition vocabulary my then be recorded 26 on mass storage 32, if desired.

28 Next, a dictionary access program of the type known 29 in the art is invoked to generate a "phonemic"
transcription for each orthography in the vocabulary. A
31 phonemic transcription is an expressive representation of 32 the sound patterns of a phrase using a set of 41 phoneme 1 symbols (1 symbol for each distinct sound in the English 2 language). This phonemic transcription is transformed 3 into articulatory transcriptions(surface forms), which 4 capture special articulatory phenomena that depend on the S context of a phoneme. Then, an acoustic transcription is 6 generated, indicating which acoustic model (represented as 7 a concise mathematical model) should be used during speech 8 recognition. The vocabulary thus transcribed can now be 9 processed by the speech recognition layer of the automated directory assistance system.

12 The above description of a preferred embodiment 13 should not be interpreted in any limiting manner since 14 variations and refinements can be made without departing from the spirit of the invention. For instance, although 16 an example of the invention has been provided above with 17 strong emphasis on an automated directory assistance 18 system, the method and apparatus for generating the speech 19 recognition vocabulary could also be used in other types of speech recognition systems. The scope of the invention 21 is defined in the appended claims and their equivalents.

Claims

1. A method for generating a speech recognition dictionary for use in a speech recognition system, the method comprising the steps of:
providing a machine readable medium containing a listing of a plurality of entity identifiers, each entity identifier including at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words;
processing said machine readable medium by a computing device for generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each orthography being a representation of a spoken utterance, each orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier;
transcribing said orthography set in the form of data elements forming a data structure capable of being processed by a speech recognition system characterized by an input for receiving a signal derived from a spoken utterance, said speech recognition system being capable of processing the signal and the data structure to select a data element corresponding to an orthography likely to match the spoken utterance and performing a determined action on the basis of the data element likely to match the spoken utterance selected by the speech recognition system;

storing said data structure on a computer readable medium.

2. A method as defined in claim 1, wherein said data element is a representation of a sound made when uttering the orthography associated with the data element.

3. A method as defined in claim 2, wherein the step of transcribing said orthography set includes the step of generating phonemic transcriptions for each orthography in said set.

4. A method as defined in claim 3, wherein the step of transcribing said orthography set further includes the step of converting the phonemic transcriptions into acoustic transcriptions.

5. A method as defined in claim 2, wherein each orthography in said orthography set shares a common word with another orthography in said orthography set.

6. A method as defined in claim 5, wherein each orthography in said orthography set includes a word that is common with every other orthography in said orthography set.

7. A method as defined in claim 2, wherein said listing is a database including a plurality of records, each record including a plurality of information fields, data stored in the information fields for a certain record constituting an entity identifier.

8. A method as defined in claim 7, wherein one of said entity identifiers includes data indicative of a name, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of a name.

9. A method as defined in claim 7, wherein one of said entity identifiers includes data indicative of a civic address, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of civic address.

10. A method as defined in claim 7, wherein one of said entity identifiers includes data indicative of a title, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of a title.

11. A method as defined in claim 7, wherein one of said entity identifiers includes data indicative of a name and data indicative of a civic address, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a civic address.

12. A method as defined in claim 7, wherein one of said entity identifiers includes data indicative of a name and data indicative of a title, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a title.

13. A method as defined in claim 7, wherein one of said information fields includes data indicative of a name, said method comprising prior to generating any orthography set effecting the steps of:
scanning said one information field to locate predetermined data;

upon identification of said predetermined data in association with a given record negating said predetermined data from said given record.

14. A method as defined in claim 13, wherein said predetermined data includes combination of words selected from a group consisting of "toll free", "day or night" and "24 hour".

15. A method as defined in claim 7, wherein one of said information fields includes data indicative of a name, said method comprising prior to generating any orthography set effecting the steps of:
scanning said one information field to locate predetermined data;
upon identification of said predetermined data in association with a given record replacing said predetermined data with new data.

16. A method as defined in claim 15, comprising the step of consulting a table that establishes a correspondence between said predetermined data and said new data to identify the new data for replacing said predetermined data.

17. A method as defined in claim 15, wherein said predetermined data is an abbreviation of a word.

18. An apparatus for generating a speech recognition vocabulary for use in a speech recognition system, said apparatus comprising:
first memory means for holding a listing of a plurality of entity identifiers, each entity identifier including at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words;

a processor in operative relationship with said first memory means;

a computer readable medium containing a program element, wherein the program element provides means for:

a) generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each said orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier;

b) transcribing said orthography set in the form of data elements forming a data structure capable of being processed by a speech recognition system characterized by an input for receiving a signal derived from a spoken utterance, said speech recognition system being capable of processing the signal and the data structure to select a data element corresponding to an orthography likely to match the spoken utterance and performing a determined action on the basis of the data element likely to match the spoken utterance selected by the speech recognitions system.

19. An apparatus as defined in claim 18, wherein each data element is a representation of a sound made when uttering the orthography associated with the data element.

20. An apparatus as defined in claim 19, wherein said program element provides means for generating a phonemic transcription for the orthographies in said set.

21. An apparatus as defined in claim 20, wherein said program element provides means for converting each phonemic transcription into an acoustic transcription.

22. An apparatus as defined in claim 20, wherein each orthography in said orthography set shares a common word with another orthography in said orthography set.

23. An apparatus as defined in claim 22, wherein each orthography in said orthography set includes a word that is common with every other orthography in said orthography set.

24. An apparatus as defined in claim 23, wherein said listing is a database including a plurality of records, each record including a plurality of information fields, data stored in the information fields for a certain record constituting an entity identifier.

25. An apparatus as defined in claim 24, wherein one of said entity identifiers includes data indicative of a name, said program element providing means for generating an orthography set including at least one orthography that includes a word common with a word included in said data indicative of a name.

26. An apparatus as defined in claim 24, wherein one of said entity identifiers includes data indicative of a civic address, said program element providing means for generating an orthography set including at least one orthography that includes a word common with a word included in said data indicative of civic address.

27. An apparatus as defined in claim 24, wherein one of said entity identifiers includes data indicative of a title, said program element providing means for generating an orthography set including at least one orthography that includes a word common with a word included in said data indicative of a title.

28. An apparatus as defined in claim 24, wherein one of said entity identifiers includes data indicative of a name and data indicative of a civic address, said program element providing means for generating an orthography set including at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a civic address.

29. An apparatus as defined in claim 24, wherein one of said entity identifiers includes data indicative of a name and data indicative of a title, said program element providing means for generating an orthography set including at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a title.

30. A machine readable medium containing a program element for instructing a computer to generate a speech recognition vocabulary for use in a speech recognition system, said computer including:
first memory means for holding a listing of a plurality of entity identifiers, each entity identifier including at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words;
a processor in operative relationship with said first memory means;
the program element providing means for:
a) generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each said orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier;
b) transcribing said orthography set in the form of data elements forming a data structure capable of being processed by a speech recognition system characterized by an input for receiving a signal derived from a spoken utterance, said speech recognition system being capable of processing the signal and the data structure to select a data element corresponding to an orthography likely to match the spoken utterance and performing a determined action on the basis of the data element likely to match the spoken utterance selected by the speech recognition system.

31. A machine readable medium as defined in claim 30, wherein each data element is a representation of the sound made when uttering the orthography associated with the data element.

32. A machine readable medium as defined in claim 31, wherein said program element provides means for generating a phonemic transcription for the orthographies in said set.

33. A machine readable medium as defined in claim 32, wherein said program element provides means for converting the phonemic transcription into an acoustic transcription.

34. A machine readable medium as defined in claim 31, wherein each orthography in said orthography set shares a common word with another orthography in said orthography set.

35. A machine readable medium as defined in claim 34, wherein each orthography in said orthography set includes a word that is common with every other orthography in said orthography set.

36. A machine readable medium as defined in claim 33, wherein said listing is a database including a plurality of records, each record including a plurality of information fields, data stored in the information fields for a certain record constituting an entity identifier.

37. A machine readable medium as defined in claim 36, wherein one of said entity identifiers includes data indicative of a name, said program element providing means for generating an orthography set including at least one orthography that includes a word common with a word included in said data indicative of a name.

38. A machine readable medium as defined in claim 36, wherein one of said entity identifiers includes data indicative of a civic address, said program element providing means for generating orthography set including at least one orthography that includes a word common with a word included in said data indicative of civic address.

39. A machine readable medium as defined in claim 36, wherein one of said entity identifiers includes data indicative of a title, said program element providing means for generating orthography set including at least one orthography that includes a word common with a word included in said data indicative of a title.

40. A machine readable medium as defined in claim 36, wherein one of said entity identifiers includes data indicative of a name and data indicative of a civic address, said program element providing means for generating orthography set including at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a civic address.

41. A machine readable medium as defined in claim 36, wherein one of said entity identifiers includes data indicative of a name and data indicative of a title, said program element providing means for generating orthography set including at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a title.

42. A machine readable medium containing a speech recognition vocabulary generated by the method defined in claim 1.

43. A machine readable medium containing a speech recognition vocabulary generated by the method defined in claim 2.

44. A speech recognition system having a memory which contains a speech recognition vocabulary representing a plurality of orthographies, said speech recognition vocabulary generated by:
providing a first computer readable medium containing a listing of a plurality of entity identifiers wherein each entity identifier comprises at least one word that symbolizes a particular meaning, said plurality of entity identifiers being distinguishable from one another based on either one of individual words and combinations of individual words, at least some of said entity identifiers including at least two separate words;
generating for at least some of said entity identifiers an orthography set including a plurality of orthographies, each said orthography in a given orthography set being a composition of different words and at least one of said different words being selected from a respective entity identifier;
storing said orthography set on a second computer readable medium in a format such that the orthographies of said orthography set are recognizable by a speech recognition system on a basis of a spoken utterance by a user.

45. A speech recognition system as defined in claim 44, wherein each orthography in said orthography set shares a common word with another orthography in said orthography set.

46. A speech recognition system as defined in claim 45, wherein each orthography in said set includes a word that is common with every other orthography in said set.

47. A speech recognition system as defined in claim 44, wherein said listing is a database including a plurality of records, each record including a plurality of information fields, data stored in an information field for a certain record constituting an entity identifier.

48. A speech recognition system as defined in claim 47, wherein one of said entity identifiers includes data indicative of a name, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of a name.

49. A speech recognition system as defined in claim 47, wherein one of said entity identifiers includes data indicative of a civic address, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of civic address.

50. A speech recognition system as defined in claim 47, wherein one of said entity identifiers includes data indicative of a title, and wherein the orthography set includes at least one orthography that includes a word common with a word including in said data indicative of a title.

51. A speech recognition system as defined in claim 47, wherein one of said entity identifiers includes data indicative of a name and data indicative of a civic address, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a civic address.

52. A speech recognition system as defined in claim 47, wherein one of said entity identifiers includes data indicative of a name and data indicative of a title, and wherein the orthography set includes at least one orthography that includes a word common with a word included in said data indicative of a name and a word common with a word included in said data indicative of a title.

53. A speech recognition system as defined in claim 47, wherein one of said information fields includes data indicative of a name, said system comprising a means for effecting the steps of:
scanning said one information field to locate predetermined data;
upon identification of said predetermined data in association with a given record negating said predetermined data from said given record.

54. A speech recognition system as defined in claim 53, wherein said predetermined data includes combination of words selected from the group consisting of "toll free", "day or night" and "24 hour".

55. A speech recognition system as defined in claim 47, wherein one of said information fields includes data indicative of a name, said system comprising a means for effecting the steps of:
scanning said one information field to locate predetermined data;
upon identification of said predetermined data in association with a given record replacing said predetermined data with new data.

56. A speech recognition system as defined in claim 55, comprising a means for consulting a table that establishes a correspondence between said predetermined data and said new data to identify the new data for replacing said predetermined table.

57. A speech recognition system as defined in claim 55, wherein said predetermined data is an abbreviation of a word.