CA2224712A1

CA2224712A1 - Method and apparatus for information retrieval using audio interface

Info

Publication number: CA2224712A1
Application number: CA002224712A
Authority: CA
Inventors: Michael Abraham Benedikt; David Alan Ladd; James Christopher Ramming; Kenneth G. Rehor; Curtis Duane Tuckey
Original assignee: Individual
Current assignee: AT&T Corp
Priority date: 1996-04-22
Filing date: 1997-03-18
Publication date: 1997-10-30
Also published as: JPH11510977A; EP0834229A1; IL122647A; MX9710150A; KR19990028327A; IL122647A0; WO1997040611A1

Abstract

A method and apparatus for retrieving information from a document server (160) using an audio interface device (110). In an advantageous embodiment, a telecommunications network includes an audio browsing node (150) comprising an audio processing node (152) and an audio interpreter node (154). An audio channel is established between the audio interface device and the audio browsing node. A document serving protocol channel (164) is established between the audio browsing node (150) and the document server (160). The document server (160) provides documents to the audio browsing node (150) via the document serving protocol channel (164). The audio browsing node (150) interprets the document into audio data and provides the audio data to the audio interface device (110) via the audio channel. The audio interface device (110) provides audio user input to the audio browsing node (150) via the audio channel. The audio browsing node (150) interprets the audio user input into user data appropriate to be provided to the document server (160) and provides the user data to the document server (160) via the document serving protocol channel (164).

Description

W O 97/40611 rCT~US97/03690 METHOD AND APPARATIJS FOR INFORM[ATION
RETR~EVAL USING AUD~O INTERFACE

~ield of the Invention The present invention relates to information retrieval in general. More particularly, the present invention relates to information retrieval over a network utili7ing an audio user interface.

Back~round of the Illv~.lion The amount of information available over communication networks is large and growing at a fast rate. The most popular of such n~lwolh~ is the Tntçrnet, which is a network of linked computers around the world. Much of the popularity of the Internet may be attributed to the World Wide Web (WWW) portion of the Internet.
The WWW is a portion of the Internet in which h~ro~ lion is typically passed between server computers and client colllpul~r~ using the Hy~ xL Transfer Protocol (HTTP). A server stores information and serves (i.e. sends) the information to a client in response to a request from the client. The clients execute computer software programs, often called browsers, which aid in the requesting and displaying of information. Examples of WWW browsers are Netscape Navigator, available from Netscape Communications, Inc., and the Tntt?rnet Explorer, available fiom Microsoft Corp.
Ser~rers, and the information stored therein; are i~Pntified through Uniform Resource Locators (URL). URL's are ~iesçril~e~l in detail in Berners-Lee, T., et al., Uniform Resource Locators, RFC 1738, Network Working Group, 1994, which is incol~.o.dLed herein by reference. For example, the URL
http://www.hostname.com/documentl.htmll, iclçntifies the document ' Illustrative URLs are used herein for example purposes only. There is no ~i~nifir~n-e to the use of any particular URL other than for exemplification of the present invention. No l~,f~,. ."~ce to actual URLs is intended.

CA 02224712 1997-12-1~
W O 97140611 PCT~US97/03690 "documentl.htrnl~' at host server "www.hostname.com". Thus, a re~uest for information from a host server by a client generally includes a URI,. The information passed from a server to a client is generally called a docurnent. Such doc~lm~nt~ are generally deflned in terms of a docurnent language, such as Hypertext Markup 5 Language (HTML). Upon request from a client, a server sends an HTML document to the client. HTML doc~lm~ontc contain information which is used by the browser todisplay information to a user at a COllll~UIt;l display screen. An HTML document may contain text, logical structure comm~ncl~, hypertext links, and user input comm~n-lc.
If the user selects (for example by a mouse click) a hy~ t link from the display, the 10 browser will request another document from a server.
Currently, WWW browsers are based upon textual and graphical user interfaces. Thus, documents are presented as images on a colllpul~l screen. Suchimages include, for example, text, graphics, hypertext links, and user input dialog boxes. All user interaction with the WWW is through a graphical user interface.
15 Although audio data is capable of being received and played back at a user computer (e.g. a .wav or .au file), such receipt of audio data is secondary to the graphical interface of the WWW. Thus, audio data may be sent as a result of a user request, but there is no means for a user to interact with the WWW using an audio interf~e W O 97/40611 PCTrUS97/~3690 Summa~ of the Invention The present invention provides a method and a~ s for retrieving 5 inforrnation from a docurnent server using an audio interface device (e.g. a telephone).
An h~ er is provided which receives documents from a document server operating in accordance with a document serving protocol. The i~ e. in~
the document into audio data which is provided to the audio user interface. The inte~ lt;l also receives audio user input from the audio interface device. The 10 intelyL~lel interprets the audio user input into user data which is a~plopl;ate to be sent to the document server in accordance with the document serving protocol and provides the user data to the server . In various embodiments, the hltt;l~rel~l may be located within the audio user interf~ce~ within the document server, or disposed in a communication channel between the audio user interface and the document server.
In accordance with one embodiment, a telecomrnunications network node for carrying out the audio browsing functions of the present invention is included as a node in a telecommunications network, such as a long distance telephone network.An audio çh~nnel is established between the audio int~rf~ce device and the node. A
document serving protocol channel is established between the node and the document 20 server. The node receives docl7ment~ served by the document server in accordance with the docnrnent serving protocol and interprets the documents into audio dataa~ opliate for the audio user interface. The node then sends the audio data to the audio int~ e device via the audio channel. The riode also receives audio user input (e.g. DTMF tones or speech) from the audio interface device and hlle~ the audio 25 user input into user data ~ l;ate for the document server. The node then sends the user data to the docurnent server in accordance with the docurnent serving protocol.
In one embodiment, the docurnent server is a World Wide Web document server which collll,lunicates with clients via the hy~c~L~l transfer protocol. In accordance with the advantages of the present invention, a user can engage in an audio 30 browsing session with a World Wide Web document server via an audio interface W ~97/40611 PCT~US97/03690device. The World Wide Web document server can treat such a browsing session in a conventional manner and does not need to know whether the particular browsing session is being initisltecl from a client executing a conventional graphical browser or from an audio interface device. The necessary interpreting functions are carried out in S the telecolllnlunications network node and these functions are transparent to both a user using the audio h~ eling device and the World Wide Web document server operating in accordance with the hypertext transfer protocol.
These and other advantages of the invention will be al)p~ent to those of ordinary skill in the art by reference to the following ~let~ilecl description and the 10 accompanying drawings.

Brief D~ lion of the D~ ,D

15Fig. 1 shows a diagram of a telecommunications system which is suitable to practice the present invention Fig. 2 is a block diagram of the components of the audio processing node.
Fig. 3 is a block diagram of the components of the audio i~ er node.
Fig. 4 is a block diagrarn of a docurnent server.
20Fig. 5 is an example audio-HTML document.
Fig. 6 is an example HTML document.
Fig. 7 is a block ~ gr~m of an embodiment in which the audio browsing functions are implem~nt~ at a user interface device.
Fig. 8 is a block diagram of the co~ ollellL~ of the user inl~lr~ce device of Fig.
25 7.
Fig. 9 is a block diagram of an embodiment in which the audio browsing functions are implem~nte~l at an audio browsing document server.
Fig. 10 is a block diagram of the colllponents of the audio browsing document server of Fig. 9.

W O 97140611 PCT~US97/03690 Fig. 11 is a block ~ grf~m of an embodiment in which the audio i~ leth~g functions are implem~ntP-1 at an audio illh,~ t~. docurnent server.
Fig. 12 is a block diagram of the colllpollents of the audio interpreter document server of Fig. 11.
Detailed Description Fig. 1 shows a diagrarn of a telecommunications system 100 which is suitable to practice the present invention. An audio interface device, such as telephone 110, is 10 connected to a local exchange carrier (LEC) 120. Audio interface devices other than a telephone may also be used. For exarnple, the audio interface device could be a multimedia computer having telephony capabilities. In accordance with the present invention, a user of telephone 1 10 places a telephone call to a telephone number associated with information provided by a document server, such as document server 160. In the exemplary embodiment shown in Fig. 1, the docllment server 160 is part of communication network 162. In an advantageous embodiment, network 162 is the rnternPt Telephone nurnbers associated with inforrnation accessible through a docurnent server, such as docurnent server 160, are set up so that they are routed to special telecommunication network nodes, such as audio blow~ g adjunct 150. In the 20 embodirnent shown in Fig. 1, the audio browsing adJunct 150 is a node in telecornmunications network 102 which is a long distance telephone network. Thus, the call is routed to the LEC 120, which further routes the call to a ~ong tli~t~nt~e carrier switch i30 via ~unk 125. Long .iict~n~e ne'twork 102 would generally have other switches similar to switch 130 for routing calls. However, only one switch is 25 shown in Fig. 1 for clarity. It is noted that switch 130 in the teleco..~ tion~
network 102 is an "intelligent" switch, in that it contains (or is connected to) a proc~ing unit 131 which may be programmed to carry out various functions. Such use of procec~ing units in telecommllnicz~tions network switches, and the pro~ g thereof, is well known in the art. Upon receipt of the call at switch 130, 30 the call is then routed to the audio browsing adjunct 150. Thus, there is established an W O97/40611 rCT~US97/03690 audio channel between telephone 110 and audio browsing ad3unct 150. The routing of calls through a telecommunications network is well known in the art and will not be described further herein.
In one embodiment, audio browsing services in accordance with the present invention are provided only to users who are subscribers to an audio browsing service provided by the telecommunication network 102 service provider. In such an embodiment, a t1~t~b~e 140 connected to switch 130 contains a list of such subscribers. Switch 130 performs a ~ t~b~e 140 lookup to ~letetTnine if the callorigin~t~l from a subscriber to the service. One way to accomplish this is to store a list of calling telephone nurnbers (ANI) in ~l~t~h~e 140. In a manner which is well known, the l,EC 120 provides switch 130 with the ANI of the telephone 110. The switch 13Q performs a tl~t~b~ce 140 lookup to ~letermine if the ANI is included in the list of subscribers to the audio browsing service stored in ~l~t~b~e 140. If the ANI is present in that list, then the switch 130 routes the call to the audio browsing adjunct 150 in accordance with the present invention. If the ANI does not belong to a subscriber to the audio bro~sing service, then an a~prop,iate m~cs~ge may be sent to telephone 1 10.
The audio browsin~ adjunct 150 contains an audio processing node 152 and an audio i~ eLer node 15-$~ both of which will be described in further detail below.
The audio browsing adjunct 150 provides the audio browsing functionality in accordance with the present in~ention.
Upon receipt ofthc c~ll from telephone 110, the audio browsing adjunct 150 esta~lisl1es a con-mllni.:~tion ;1~ 1 with the docùment sen,-er 160 associated with the called telephone number ~ i~ linl; 164. The association of a telephone number with a document server will be described in further detail below. In a WWW embodiment, link 164 is a socket conncction o~ er TCP/IP, the establishment of which is wellknown in the art. For additional inforrnation on TCP/IP, see Comer, Douglas, Internetworking with TCP~'lr: Principles, Protocols, and Architecture, EnglewoodCliffs, NJ, Prentice Hall, 1988. which is incorporated by reference herein. Audio browsing adjunct 150 and the docurnent server 160 cornmunicate with each other W ~ 97/40611 PCT~US97/03690 using a document serving protocol. As used herein, a ~1ocllment serving protocol is a communication protocol for the ~ rcl of information between a client and a server In accordance with such a protocol, a client requests inforrnation from a server by sending a request to the server and the server responds to the request by sending a ,~ 5 document cont~ining the requested information to the server. Thus, a document serving protocol channel is established between the audio browsing adjunct 150 and - the document server 160 via link 164. In an advantageous WWW embodiment, the document serving protocol is the Hypertext Transfer Protocol ~HTTP). This protocol is well known in the art of WWW comml-ntc~tion and is described in detail in Berners-Lee, T. and Connolly, D., Hypertext Transfer Protocol (H7TP) Working Draft of the Internet Engineering Task Force, 1993, which is incorporated herein by 1 crCl ~llce.
Thus, the audio browsing adjunct 150 communicates with the docutnent server 160 using the HTTP protocol. Thus, as far as the document server 160 is conrerne-l, it behaves as if were communicating with any conventional WWW client ~cecl~ting a conventional graphical browser. Thus, the docl~ment server 160 serves doc~-m~.nt~ to the audio browsing adjunct 150 in response to requests it receives over link 164. A
document, as used herein, is a collection of information. The document may be a static document in that the document is pre-defined at the server 160 and all requests for that document result in the same information being served. ~lternzltively, the document could be a dynamic document~ whereby the information which is served inresponse to a request is dynamically generated at the time the request is made.
Typically, dynamic documents are generated by scripts, which are programs executed by the server 160 in ~c~,onse to a request for illÇul.llation. For example, a URL may be associated with a script. When the server 160 receives a request including that URL, the server 160 will execute the script to generate a dynamic document, and will serve the dynamically generated document to the client which requested the information. The use of scripts to dynamically generate documents is well known in the art.

W O 97/40611 PCT~US97/0369Q
The docl-m~nt~ served by server 160 include text, logical structure comm~n-ls, hypertext links, and user input comm~n-l~ One characteristic of these documents is that the physical structure of the information contained in the document (i.e., the physical layout view of the information when displayed at a client executing a S conventional graphics l~ ~), is not defined. In~tezl~, a document contains logical structure comm~n-le, which are h~Lcl~.eted at a browser to define a physical layout.
For example, such logical structure comm~n-lc include e,1l~ha~is comm~n-lc, new paragraph comm~n~ls, etc. The syntactic structure of such comm~n~s may conform to the conventions of a more general purpose document structuring language, such aslo Standard Generalized Markup T s~ngl~ge ~SGML), which is described in Goldfarb, Charles, The SGML Handbook, Clarendon Press, 1990, which is incorporated by reference herein. In the WWW embodiment of the present invention, these documents are Hypertext Markup Language (HTML) documents. HTML is a well known language based on SGML which is used to define docurnents which are servedby WWW servers. HTML is described in detail in Berners-Lee, T. and Connolly, D.,Hypertext Markup Language (HTML), Working Draft of the Internet Engineering Task ~orce, 1993, which is incorporated herein by reference.
When an HTML docllment is received by a client executing a conventional browser, the browser hll~ ls the HTML document into an image and displays the image upon a computer display screen. However, in accordance with the principles of the present invention, upon receipt of a document from document server 160, the audio browsing adjunct 150 converts the docllment into audio data. The details of such conversioll will be discussed in filrther detail below. The audio data is then sent to telephone 110 via switch 130 and LEC 120. Thus, in this manner, the user of telephone 110 can access inforrnation from document server 160 via an audio interface.
In addition, the user can send audio user input from the telephone 1 10 back to the audio browsing adjunct 150. This audio user input may be, for example, speech - signals or DTMF tones. The audio l~ w~ g adjunct 150 converts the audio user input into user data or instructions which are a~lol,liate for tr~n~mittin~ to the W ~ 97140611 PCTrUS97/03690 docurnent server 160 via link l 64 in accordance with the HTTP protocol. The user data or instructions are then sent to the document server 160 via the document serving protocol channel. Thus, user interaction with the document server is via an audio ucer interface.
~ S In this manner, a user can engage in a browsing session with a WWW
document server via an audio interf~ee The document server can treat such a browsing session in a conven~ion~l manner and does not need to know whether a particular browsing session is being initiated from a client executing a conventional graphical browser or from an audio interface such as a telephone. The audio browsing adjunct 150 within the network 102 i~ L~ the documents being served by document server 160 into audio data a~lo~iate to be sent to telephone 110. In addition, the audio browsing adjunct 150 inL, ~p~L~ audio user input received attelephone 110 into user data appropriate to be received by the document server 160.
A more detailed description of an advantageous embodiment will now be given in conjunction with an example browsing session. Assurne a user at telephone 110 dials the number ~123) 456-78902 which has been set up to be associated withinformation accessible through do~ument server 160 and therefore routed to audiobrowsing ad~unct 150. The call gets routed to LEC 120, at which point LEC 120 recognizes the telephone number as one which is to ~e routed to long ~1i.ct~nre network 102, and more particularly to switch 130. Upon receipt of the call, switch 130 in turn routes the call to the audio browsing adjunct 150 via link 132. Thus, there is established an audio channel between telephone 110 and audio browsing ad.~unct ~SO.
Further details ofthe audio procçs.cin~ node 152 are shown in Fig. 2. The audio processing node 152 comprises a telephone network i~ ce module 210, a DTMF decoder/generator 212, a speech recognition module 214, a text to speech module 216, and an audio play/record module 218, each of which is conn~ctecl to an 2 Telephone numbers are used herein for example purposes only. There is no si~nifir~nre to the use of - any particular te!PFh~nr number other than for exemplification of the present invention. No reference to actual tel~ophc-ne numbers is intended.

CA 02224712 1997-12-1~
~D97/40611 PCT~US97/03690 audio bus 220 and a control/data bus 222, as shown in Fig. 2. Further, the audioproces~ing node 152 cont~in.~ a central proces~ing unit 224, memory unit 228, and a packet network interface 230, each of which is connected to the control/data bus 222.
The overall functioning of the audio processing node 152 is controlled by the central processing unit 224. Cenkal processing unit 224 operates under control of executed computer program instructions 232 which are stored in memory unit 228. Memory unit 228 may be any type of machine readable storage device. For example, memoryunit 228 may be a random access memory (RAM), a read only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (EPROM), an electronically erasable progr~nnm~ble read only memory (EEPROM), a magnetic storage media (i.e. a magnetic disk), or an optical storagemedia (i.e. a CD-ROM). Further, the audio processing node 152 may contain various combinations of m~(~hine readable storage devices, which are ~rcecsihle by the central processing unit 224, and which are capable of storing a combination of computer program instructions 232 and data 234.
The telephone network interface module 210 h~n~lles the low level interaction between the audio processing node 152 and telephone network switch 130. In one embodiment, module 210 consists of one or more analog tip/ring loop start telephone line terminations. Through module 210, central processing unit 224 is able to control link 132 via control data bus 222. Control functions include on-hook/off-hook, ring detection, and far-end on-hook detection. In an alternate embodiment, module 210includes one or more channelized digital interfaces, such as TltDS1, E1, or PRI.Sign~ling carl be in-balld or out-of-band. The DTl~IF decoder/generator 212 handles the conversion of DTMF tones into digital data and the ~ dlion of DTMF tones from digital data. The speech recognition module 214 performs speech recognition of speech signals on,qin~ting at user telephone 110 and received over the audio bus 220.
Such speech signals are processed and converted into digital data by the speech recognition module 214. The text to speech module 216 converts text of documentsreceived from document server 160 into audio speech signals to be tr~n~mitt~l to a user at telephone 110. The audio play/record module 218 is used to play audio data W ~97140611 PCTrUS97/03690 received from document server 160 at telephone 110 and to record audio data such as a user's voice. It is noted that each module 210, 212, 214, 216, 218 are shown as separate functional modules in Fig. 2. The functionality of each of modules 212, 214, 216, and 218 may be implemented in hardware, software, or a combination of S hardware and software, using well known signal processing techniques. The functionality of module 210 may be implemçnt~d in hardware or a combination of hardware and software, using well known signal proce~ing techniques. The functioning of each of these modules will be described in further detail below in conjunction with the example. The packet network interface 230 is used for 10 communication between the audio processing node 152 and the audio interpreter node 154.
The audio browsing adjunct 150 also contains an audio i-lle.~,eler node 154 which is connf~cte~l to the audio processing node 152. The audio intc.l..el~l node 154 is shown in further detail in Fig. 3. Audio h~tcl~lc;L~l node 154 contains a central processing unit 302, a memon 304, and two packet network interf~-~es 306 and 308connected by a control/data bus 310. The overall fi~nctioning of the audio inte~node 154 is controlled by the central processing unit 302. Central processing unit 302 operates under control of executed computer program instructions 312 which are stored in memory unit 3W.
Memory unit 30~ ma~ be any type of machine readable storage device. For example, rnemory unit 3û-i m~y be a random access memory (RAM), a read only memory (ROM), a pro~ramrn~ble read only memory (P~OM), an erasable programma~le read o~ me~llo~; (~:P~OM), an elèctronically erasable programmable read only memory (E~PRO~1). a magnetic storage media (i.e. a magnetic disk), or an optical storage media (i.e. a Cl)-RO~-I). Further, the audio hlle.~ el node 154 may contain various combinations of maehine readable storage devices, which are ~cce~ible by the central processin~ unit 302, and which are capable of storing acombination of computer pro~rn instructions 312 and data 314.

W O97/40611 PCTrUS97103690 The control of an al~a~-ls, such as the audio processing node 152 and the audio h~ el node 154, using a central processing unit executing software instructions is well known in the art and will not be described in further detail herein.
Retllrning now to the example, the call placed from telephone 110 to telephone S number (123) 456-7890, has been routed to the audio browsing adjunct 150, and in particular to the audio processing node 152. The central procçssing unit 224 detects the ringing line through the telephone network interface module 210. Upon detection of the call, the central processing unit p~lrOlllls a lookup to ~letermine the URL which is associated with the dialed number (DN). The dialed telephone number (DN), is provided to switch 130 from the local PYch~n~e carrier 120 in a manner which is well known in the art, and in turn, the DN is provided to the audio browsing adjunct 150 from switch 130. A list of URL's which are associated with DN's is stored as data 234 in memory 228. Assume in the present example the DN (123) 456-7890 is associated with URL http://www.att.com/~phone/greeting.
In an alternate embodiment, the list of URL' s associated with various DN' s is stored in a network (l~t~b~ee, such as rl~t~h~ee 140, instead of locally at the audio browsing adjunct 150. ~n such a~r embodiment, the central processing unit 224 of the audio processing node 152 sends a signal to network switch 130 to request a lookup to dz~t5~hzlee 140. The switch would request the URL from ~t~h~ee 140 and return the resulting URL to the audio processing node 152. It is noted that the communication between the audio procee~ing node 152, switch 130 and ~l~t~h~ee 140, may be via an out of band ~ign~lin~ system, such as SS7, which is well known in the art. An advantage to this configuration is that a plurality of audio browsing adjuncts may be present in the n~ L 102, and each may share a single r~t~k~ee 140. In this manner, only one ~t~b~ee 140 needs to be updated with URLs and associated DNs.
After receiving the URL associated with the DN, the central processing unit 224 of the audio proce~ein~ node 152 sends a message (including the URL) to the audio hl~ l node 154 instructing the audio hlL~.~ lete~ node 154 to initiate an audio hl~ rc~ g/browsing session. Such a message is passed from the central processing unit 224 to the packet network int~ e 230 via the control/data bus 222.

~Z

CA 022247l2 l997-l2-l5 V~O 97/40611 PCTAUS97/03690 The mf~ss~,e is sent from packet network interface 230 of the audio pro~Ps~in~ node 152 to the packet network interface 306 of the audio inlelyl~ g node 154 via connection 153. In an advantageous embodiment, the audio procçccin~ node 152 andthe audio illt~ .lel node 154 are collocated and thus form an integrated audio s browsing adjunct 150. In ~It~?rn~t~ embo~liml?nt~, the audio processing node 152 and the audio inh.~l~,k;l node 154 may be geographically separated. Several such alternate embodiments are described below. The connection 153 may be a packet data network connection (e.g., TCP/IP connection over Ethernet) which is well known in the art.
lo Ret~rninp now to the example, the audio ill~ et~ node 154 receives a message via packet network int~rf~e 306 that it is to initiate a new audio intc~ .Li~ w~illg session. The central procescin~ unit 302 is capable of controlling multiple audio interpretingtbrowsing sessions for multiple users simultaneously. Such multiprocess execution by a processor is well known, and generally entails the inct~nti~tion of a software process for controlling each of the session. Upon the initiation of an audio illL~ g/ browsing session, the audio interpreting node 154 sends an HTTP request for URL
http://www.att.com/~phone/greeting to the document server 160 over connection 164.
In this example, it is ~mmecl that the document server 160 is associated with the host name www.att.com.
Document server 160 is shown in further detail in Fig. 4. Document server 160 is a colll~u~ cont~inin~ a central proceccing unit 402 conn~cted to a memoly404. The functions of the document server 160 are controlled by the central procec~in~ unit 402 ç~çc~tin~ computer program instructions 416 stored in memory404. In operation, the docllment server 160 receives requests for docnml~ntc from the audio hlte.pl~,t~,. node 154 via connection 164 and paclcet network int~rf~se 440. The central procçscing unit 402 i~ L~ the requests and retrieves the requested information from memory 404. Such requests may be for HTML documents 408, audio-HTML doculllelll~ 410, audio files 412, or graphics files 414. HTML
documents 408 are well known and contain conventional HTML instructions for use .

~3 WO 97/40611 PCTrUS97/03690 in conventional WWW graphical browsers. An audio-HTML docllm~ont is similar to an HTML document but has additional instructions which are particularly directed to intt;,~,~lalion by the audio hlL~ et~l node 154 in accordance with the present invention. Such instructions which are particular to the audio browsing aspects of the s present invention will be identified herein as audio-HTML instructions. The details of audio-HTML documents and audio-HTML instructions will be described in further detail below. Audio files 412 are files which contain audio inforrnation. Graphics files 414 are files which contain graphical information. In a manner which is well known in the art, a URL identifies a particular docurnent on a particular document l 0 server. Memory 404 may also contain scripts 418 for dynamically generating HTML
documents and audio-HTML docurnents. Thus, ret~-rning to the present exarnple, an HT~P request for URL http://www.att.com/~phone/greeting is received by the document server 160 from the audio hlLe~ et~l node 154 via connection 164.
The document server interprets this URL and retrieves an audio-HTML page from memory 404 under central proc~ in~ unit 402 control. The central processingunit 402 then sends this audio-HTML document to the audio interpreter node 154 via packet network intçrf~ e 440 and link 164.
The audio-HTML document 500 which is sent in response to the request for URL http://www.att.com/~phone/greeting, and which is received by the audio intt;l~ ,r node 154, is shown in Fig. 5. The audio h~ t~l node 154 begins h~ etillg the document 500 as follows. In one embodiment the <HEAD> section, lines 502-506, of the document 500, including the title of the page, is not converted into voice, and is ignored by the audio intc.~ . node 154. In alternate embo~liment~, the <TITLE> section may be hltt;lple~d using text to speech as described below.
The text "Hello!" at line 508 in the <BODY> section of the docllmerlt 500 is sent from the audio illt~ l~.let~. node 154 to the audio procPc.cing node 152 via packet network in~Prf~e 306 and link 153. Along with the text "Hello!", the audio intcl~let~l node 154 sends instructions to the audio processin~ node 152 that the text is to be processed by the text to speech module 216. The audio proces~in~ node 152 receives the text and instructions via the packet network interface 230, and the text is supplied to the text to speech module 216 via control/data bus 222. The text to speech module 216 generates the audio signal to play "Hello "3 and sends the signal to the telephone network interface module 210 via audio bus 220. The telephone network ~ S interface module 210 then sends the audio signal to telephone l lû. It is noted that text to speech conversion is well known and conventional text to speech techniques may be used by the text to speech module 214. For examp}e, the punctuation "!" in the text may be hlt~ led as increased volume when the text is converted to speech.
Line 510 of document 500 is a form instruction, and the audio interpreter node 154 does not send anything to the audio procPecing node 152 in connection with this instruction. The audio i~ ct~,. node 154 hl~ .et~ line S10 to indicate that it will be expecting a future response from the user, and that this response is to be given as an argument to the script identified by http://m~.hine:8888/h~tin~s-bin/getscript.sh.
Line 512 is an audio-HTML instruction. The audio inte~ Ler node 154 interprets line 512 by sçn~lin~ an http request to server 160 for the audio file identified by www-spr ih.~tt com/~h~tings/annc/greetin~ mu8~ which resides in memory 404 in storage area 412. The document server 160 retrieves the audio file from memory 404 and sends it to the audio in~ el- node 154 via link 164. Upon receipt of the audio file, the audio inle~ LeL node 154 sends the file, along with instructions indicating that the file is to be played by the audio play/record module 218, to the audio processing node 152. Upon receipt of the file and instructions, the audio processing node 152 routes the audio file to the audio play/record module 218. The audio play/recordmodule 218 gener~tPs an audio signal which is sent to the telephone netwark interface module 210 via audio bus 220. The telephone network interface module 210 then sends the audio signal to the telephone 110. As a result, the user at telephone 110 hears the collLent~ of the audio file www-spr.ih ~tt com/~h~tin~.~/annc/greeting m~l8 at the speaker of telephone 110.
Lines 514-516 are audio-HTML instructions. The audio intel~ Lel node 154 does not send line 514 to the audio processing node 152. Line 514 in~lic~t~s that a 3 Italic type is used herein to indicate text which is played as audio speech.

/~

CA 02224712 1997-12-1~
W O 97/40611 PCT~US97103690 response from the user is to be sent to the document server 160 associated with the variable name "collectvar". This instruction marks the beginning of a prompt-and-collect sequence in which the user will be prompted for, and supply, information.
This instruction is followed by a prompt instruction 516 and a set of choice S instructions 518-522. The audio inle.~ ler node 154 processes line 516 in a manner similar to that of line 512, and as a result, the user at telephone 110 hears the audio from the file identified by http://www-spr.ih.att.corn/~hS~ting~/annc/choices.mu8.
The audio will ask the user to make a selection based upon some criteria, and the audio i"le~Letel node 154 will wait for a response from the user at telephone 110.
Also, as a result of proces~in~ line 516, the centra} processing unit 302 sends a message to the audio processin~ node 152 instructing the telephone network interface module 210 to be pre~cd to receive audio input.
The user responds with audio user input from telephone 110. The audio user input may be in the form of DTMF tones generated by the user pressing a key on the keypad of telephone 110. For example, if the user presses "2" on telephone 110 keypad, the DTMF tone associated with "2" is received by the audio processing node 152 via the telephone network interface module 210. The audio signal is recognized as a DTMF tone by the central processing unit 224, and instructions are passed to telephone network int~ re module 210 to send the signal to the DTMF
decoderlgen~r~tor 212 via the audio bus 220. The central proc~ in~ unit 224 instructs the DTMF decoder/generator 212 to convert the DTMF tone into digital data and to pass the digital data to the packet network interface 230 for tr~n.~mi~ion to the audio hll.,.~ node 154. Upon receipt of the signal, the audio h~ pr~,t~,L node 154 recognizes that the user has responded with choice 2, which col~;s~,o~ds with the value "Jim" as indicated by line 520 of the audio-HTML docl-m.?nt S00. Thus, theaudio hl~ . node 154 sends the value "Jim" associated with the variable "collectvar" to the script http://m~.hin~:8888/h~etin~.c-bin/getscript.sh identified in line 510 of document 500. If the user responds with input which is not listed as a choice, in this example, a response other than 1-3, or if the user does not respond within a certain time period, then the audio h~h l~,c;ler node 154 instructs the text to /~

W O 97/40611 PCTrUS97/03690 speech module 216 to generate a speech signal "choice not understood, try again", and that signal is provided to the user at telephone 1 10.
.~ltt~ ively, audio user input may be in the forrn of a voice signal. Instead ofthe user pressing the number 2 on telephone 1 10 keypad, the user will speak the word s ~'two" into the telephone 1 10 microphone. The voice signal is received by the audio processing node 152 via the telephone network interface module ~10. The audio signal is recognized as a voice signal by the central processing unit 224, and instructions are passed to telephone network interface module 210 to send the signal to the speech recognition module 214 via the audio bus 220. The central processing l O unit 224 instructs the speech recognition module 214 to convert the voice signal into digital data and to pass the data to the packet network int~rf~ee 230 for tr~n~mi~ion to the audio int~ h, node 154. Upon receipt, the audio inL~l~let~,l node 154 processes the data as described above in conjunction with the DTMF audio user input.
It is noted that the speech recognition module 214 operates in accordance with 15 conventional speech recognition techniques which are well known in the art.
Hypertext links often appear in HTML documents. When displayed on the screen of a computer executing a conventional graphical browser, a hypertext link will be graphically identified (e.g. un~erlin~). If a user graphically selects a link, for exarnple by clicking on the link with a mouse, then the browser generates a request for 20 the document indicated by the link and sends the request to the document server.
Consider the HTML document 600 shown in Fig. 6. Lines 604 and 605 specify a conventional HTML description of hypertext links. If this page were being processed by a conventional pr~phic~l browser, the display would look like:

This page gives you a choice of links to follow to other World Wide Web pages. Please click on one of the links below.

click here for information on cars click here for information on trucks ~0 ~7/40611 PCT~US97/03690 The user would then select one of the links using a graphical pointing device such as a mouse. If the user selects the link click here for inforrnation on cars then the browser would generate a request for the document identified by the URL
http //www.abc.corn/çSIns html If the user selects the linlc click here for inforrnation S on trucks then the browser would generate a request for the docl-m~nt identified by the URL http://www.abc.corn/truck~.html The processing of HTML hypertext links in accordance with the present invention will now be described with reference to Fig. 6. Assurne that the document server 160 has served the HTML document 600 shown in Fig. 6 to the audio inlel~letel node 154. Lines 602 and 603 will be converted to audio signals by the text to speech module 216 and provided to the user telephone 1 l O as described above.
Thus, the user will hear the audio, This page gives you a c*oice of links to follow to other World Wide Weh pagcs. Please click on one of the links below. When line 604 is reached, the audio interpreter node 154 will recognize line 604 as being a hyp~lk;~l link. The audio interpreter node 154 sends an instruction to the audio processing node 152, instructing the DTMF decoder/generator 212 to generate a tone to the telephone 110. Alternatively, the tone could be generated by the audio h~ c;Ler node 154 sending an instruction to thc audio processing node 152, instructing the audio play/record module 218 to play an audio file cont~ininP: tone audio. The particular tone is one which is used to si~nify the beginnin~ of a hypertext link to the user. The audio inl~,,ulcle- node 15~ then supplies the text of the hypertext link, "click here for inforrnation on cars"~ to the audio processing node 154 with an instruction indicating that the text is to be proccsscd by the text to speech module 216. As a result, the speech audio signal "click h~rc for information on cars", is provided to the telephone 11 0. The audio int~ c. nodc 154 then sends an instruction to the audio processing node 152, instructing the DT?-IF decoder/generator 212 to generate a tone to thetelephone 110. This p~rticular tone is one which is used to signify the end of ahyperte~ct link to the uscr. The tones used to signify the beginning and end of hypertext links may be the sarne or different tones. The ending tone is followed by a W~97/40611 PCT~US97/0369 pause. As an alternative to using tones, the beginning and end of a hypertext link may be identified by speech audio signals such as "begin ~ink /hypertext] end linK'.If the user wishes to follow the link, then the user supplies user audio input during the pause. For example, suppose the user wanted to follow the link "click here 5 for information on cars". The user would enter audio input during the pause following the generated speech audio signal for the link. The audio input may be, for example, a DTMF tone generated by pressing a key on the telephone 1 10 keypad. The DTMF
tone is received by the audio proce~sin~ node 152 and processed by the DTMF
decoder/generator 212. Data repres~?ntin~ the DTMF tone is provided to the audioint~.~,.. lel node 154 via the control/da~a bus 222, packet network interface 230, and link 153. Upon receipt of the signal, the audio interpreter node 154 recognizes that the signal has been received during the pause following the selected link, and the audio interpreter node 154 generates a request for the WWW document identified by the URL http://www.abc.com/c~rs html, which is associated with the selected link15 Alternatively, audio user input for selecting a hypertext link may be in the forrn of a speech signal.
Another type of link is a hypertext anchor link. An anchor link allows a user to jump to a particular location within a single HTML document. In conventional graphical browsers, when a user selects an anchor link, the browser displays the20 portion of the document indicated by the link. In accordance with the audio browsing techniques of the present invention, if a user selects an anchor link, the audioinle~ ler node 154 will begin int~lplt;liLlg the docurnent at the point specified by the link. E;or exarnple, line 620 of doc~ ent 600 contains a hypertext anchor to theportion of the document at line 625. This hypertext link is identified to the user in a 25 manner similar to that of the hypertext links which identify new HTML docl-m~nt~, as described above. The h~,.le~L anchor links may be ~ tinglli~hçcl by, for exarnple, a different audio tone or a generated speech signal identifying the link as an anchor link.
If the user selects the anchor link at line 620, then the audio i~ er node 154 will skip down to the text at line 625 and will begin interpreting the HTML document 600 3 0 at that point.

W ~ 97140611 PCTnUS97/03690 The advantageous embodiment described above in conjunction with Fig. 1 is configured such that the audio browsing adjunct 150, including the audio processing node 152 and the audio h~ ,ler node 154, is embodied in a telecommunications network node located within a long distance telecommunications network 102. This5 configuration provides the advantage that the audio browsing functions in accordance with the present invention can be provided to telephone network subscribers by the telephone network 102 service provider. In such a configuration, there is no additional hardware required at the user premises or at the docurnent server. All audio hrowsing functions are provided by components within the telephone network 102.
10 However, ~lt~rn~t~ configurations are possible and such ~Itern~te configurations could be readily implemer t~A by one skilled in the art in view of the present disclosure.
One such alternate configuration is shown in Fig. 7, in which the functions of the audio browsing adjunct are shown implPme~t~1 at a user interface device 700. In such an embodiment, the functions of the audio processing node 152, along with the 15 functions of the audio i~ ,t~- node 154, are integrated within the single user interface device 700. The user interface device 700 co,..~n~ icates with the document server 160 through a communication link 702. Link 702 is similar to link 164 which was described above in c-~nn~ctinn with Fig. 1. Thus, link 702 may be a socket connection over TCP/IP, the establishment of which is well known in the art. User int~?rf~re device 700 is shown in further detail in Fig. 8. User int~rf~re device 700 comrrices a keypad/keyboard 802 and a microphone 804 for accepting user input, and a speaker 806 for providing audio output to the user. The user interface device 700 also comrri~es a keypad/keyboard interface modulè 816 com1ected to a control/data bus 824. The user interface device 700 also comprises a codec 810, a speech recognition module 818, a text to speech module 820, and an audio play/record module 822, each of which is conn~ctecl to an audio bus 808 and the control/data bus 824 as shown in Fig. 8. The codec 810 contains an analog to digital converter 812 and a digital to analog converter 814, both of which are controlled by a central processinp unit 826 via the control/data bus 824. The analog to digital converter 812 converts analog audio user input from microphone 804 into digital audio signals and provides Z ~

W O 97/40611 PCTrUS97/03690 the digital audio signals to the audio bus 808. The digital to analog converter 814 converts digital audio signals from the audio bus 808 to analog audio signals to be sent to the speaker 806. The keypadtkeyboard interface module 816 receives inputfrom the keypad/keyboard 802 and provides the input to the control data bus 824. The speech recognition module 818, the text to speech modu}e 820, and the audio play/record module 822, perforrn the same functions, and are similarly configured, as modules 214, 216, and 218, respectively, which were described above in conjunction with Fig. 2. In addition, the user int~ re device 700 contains a packet network interface 834 for connecting to a packet network, such as the rnt~rnet, via link 702.
Further, the user int~rf~l~e device 700 contains central ~ .ces~ g unit 826 and a memory unit 828, both of which are connPct~rl to the control/data bus 824. The overall functioning of the user interface device 700 is controlled by the cenkalprocessing unit 826. Central processing unit 826 op~.~Les under control of executed computer program instructions 830 which are stored in memory unit 828. Memory unit 828 also contains data 832.
The user int~rf~e device 700 implements the ~unctions of the audio processing node 152 and the audio interpreter node 154, which were described above in conjunction with the embodiment of Fig. 1. These functions are implement~?d by the central processing unit 826 executing co.ll~uh. program instructions 830. Thus, the computer program instructions 830 would include program instructions which are the same as, or similar to: 1) colllyulei prograrn instructions 232 implementing the functions of the audio proc~ssing node 152; and 2) colllpuL~. program instructions 312 implem~nting the ~mction~ of the audio hlte.~lelcl node 154. The fimctioning of the audio processin~ node 152 and the audio i.lL~ . Ler node 154 were described in detail above, and will not be described in further detail here. Central proce~in~ unit 836 is capable of executing multiple processes at the sarne time, and in this way impl.orn~nt~
the functions of the audio proce~ing node 152 and the audio inte.~lel~. node 154.
This multiprocess fimctioning is illllstr~t~d in ~ig. 8 where the central processing unit 826 is shown executing audio illL~.~.e~illg/browsing process 836 and audio processing process 838.

V~O 97/40611 PCT~US97/03690 In operation, a user of user interface device 700 would request a URL using keypad/keyboard 802 or microphone 804. If the keypad/keyboard 802 is used to request a URL, the keypad/keyboard interface module 816 would provide the requested URL to the central processing lmit 826 via the control/data bus 824. If the 5 microphone 804 is used to request a URL, the user's voice is received by microphone 804, ~1igiti7e~1 by analog to digital conv~ 812, and passed to the speech recognition module 818 via the audio bus 808. The speech recognition module 818 would then provide the requested URL to the central proc~.cing unit 826 via the control/data bus 824.
10Upon receipt of the URL, the central pr~ce~ing unit 826 initi~tes an audio browsing/h~ lg session by in.ct~nti~tin~ an audio interpreting/browsing process 836. The audio interpreting/browsing process 836 sends an HTTP request to the document server 160 via the packet network int~rf~e 834 in a manner similar to that described above in conjunction with the embodiment of Fig. 1. Upon receipt of the 15docl-nnent from document server 160, the audio illL~ Lillg/browsing process 836 int~ the document in accordance with the audio browsing techniques of the present invention. The audio reslllting from the i~ L~I~Lion of the document is provided to the user via the speaker 806 under control of the audio processing process 838. Similarly, a user of the user int~rf~re device 700 can provide audio user input to 20 the user interface device via the microphone 804.
Since the audio i~ lc;Lillg/blow~illg process 836 and tne audio processing process 838 are co-resident in the user interface device 700, all communicationsbetween the two processes takes place through the central processing unit 826 via inter-process comrnunication, and all c-~mml-nication between the processes 836, 838 25 and other element.c of the user interface device 700 takes place via the control/data bus 824.
Figs. 7 and 8 show the user int~.rf~ce device 700 colll~ icating directly with the docurnent server 160 in the packet network 162. ~lt~ tively, the user interface device 700 could be configured to commnnic~te~l with the docurnent server 160 via a 30 standard telephone connection. In such a configuration, the packet net~,vork interface W O 97/40611 PCT~US97/03690 834 would be replaced with a telephone interface circuit, which would be controlled by central procec~in~ unit 826 via control/data bus 824. User interface device 700 would then initiate a telephone call to the document server via the telephone network.
The document server 160 would k....i~ e the call from the user hll~.r~ce device 700 5 using hardware similar to the telephone network interface module 210 (Fig. 2).~lt~?rn~tively, the call could be terrnin~ted within the telephone network, with the termin~tion point providing a packet network connection to the docllmt?nt server 160.
In an alternate configuration shown in Fig. 9, the functions of the audio browsing adjunct 150 (including the fimctions of the audio processing node 152 and the audio il~le.~ ,te, node 154) and the document server 160 are implemçnt~-l within an audio browsing ~ocllment server 900. As illustrated in Fig. 9, calls are routed from a telephone 110, through LEC 120, switch 130, and another LEC 902, to the audio browsing document server 900. Thus, in this particular embodiment, the audio browsing doc~-ment server 900 could be reached ~rom a conventional telephone 110via a telephone network. In addition, the audio browsing docl~m~nt server 900 is also connected to the TntPrn~t via a link 904. The audio browsing document server 900 is shown in further detail in Fig. 10. ~ The audio browsing document server 900 comprises a telephone network int~ c e module 1010, a DTMF decoder/generator 1012, a speech recognition module 1014, a text to speech module 1016, and an audio play/record module 1018, each of which is conn~cted to an audio bus 1002 and a control/databus 1004,asshowninFig. 10. Eachofthesemodules 1010, 1012, 1014, 1016, and 1018 perform the same functions, and are similarly configured, as modules 21G, 212, 214, 216, and 21g, lespe~ ely, which were described above in conjunction with Fig. 2. In addition, the audio browsing docnm~nt server 900 contains a packet 2s net~vork interface 1044 for connecting to a packet network, such as the Internet, via link 904. The packet network interf~e 1044 is similar to the packet network interface 230 described above in coniunction with Fig. 2. Further, the audio browsing document server 900 contains a central processing unit 1020 and a memory unit 1030, both of which are conn~octe~l to the control/data bus 1004. The overall functioning of the audio browsing document server 900 is controlled by the central processing unit W ~ 97/40611 PCT~US97/03690 1020. Central processing unit 1020 operates under control of executed computer program instructions 1032 which are stored in memory unit 1030. Memory unit 1030also contains data 1034, HTML documents 1036, audio-HTML doc~lment.~ 1038, audio files 1040, and graphics files 1042.
The audio browsing document server 900 implements the functions of the audio processing node 152, the audio in~ lt;l node 154, and the docurnent server160, which were described above in conjunction with the embodiment of Fig. 1.
These functions are implemented by the central processing unit 1020 executing computer program instructions 1032. Thus, the C~ lpUl.,l program instructions 103~!
would include program instructions which are the same as, or similar to: 1 ) COn~)ult;
program instructions 232 implementing the functions of the audio processing node152;2) computer program instructions 312 impl~mentin~ the functions ofthe audio interpreter node 154; and 3) co~.lpuL~ I program instructions 416 implçnnenting the functions of the document server 160. The functioning of the audio processing node 152, the audio inte~ ,t~r node 154, and the document server 160 were described in detail, and will not be described in further detail here. Central processing unit 1020iS
capable of executing multiple processes at the same time, and in this way implements the functions of the audio processing node 152, the audio h1tel~l~t~ l node 154, and the document server 160. This multiprocess functioning is illustrated in Fig. 10 where the central proce~ing unit 1020 is shown ~cec~lting audio hllell,leti1lg/browsing process 022,cloc-lment serving process 1024, and audio processing process 1026.
In operation, a call placed by telephone 1 10 to a telephone nurnber associated with information ~cceseihle through the audio browsing doc~lment server 900, is routed to the audio browsing document server 900 via LEC 120, switch 130, and LEC
902. It is noted that a plurality of telephone numbers may be associated with various information ~ccessihle through the audio blowsillg docurnent server 900, and each such telephone number would be routed to the audio browsing doc~lment server 900.
The ringing line is ~l~.tectefl through the telephone network interface module 1010 under control of the audio processing process 1026. IJpon detection of the call, the central processing unit 1020 ~elro-.lls a lookup to clet~rmine the URL which is 2~

W ~ 971406~ PC~nUS97~a3690 associated with the dialed nunnber (DN). The DN is provided to the audio browsing doc~ nt server 900 from the LEC 902 in a manner which is well known in the art.
A list of DN's with associated U~L's is stored as data 1034 in memory 1030. Uponreceipt of the URL associated with the DN, the central processin~ unit 1020 initi~t~
an audio browsing/inl~ ting session by in~t~nti~ting an audio inte.~ hlg/browsing process 1022. The audio i~ leLillg/browsing process 1022 sends an HTTP request to the docl-mt~nt serving process 1024 which is co-executing on the central processing unit 1020. The document serving process 1024 perffirms the ~locl~m~nt server functions as described above in conjunction with document server 160 in the embodiment shown in Fig. 1. These docl-ment server fimrtion~ are supported by the HTML doc--ment~ 1036, audio-HTML doc-lm~nt~ 1038, audio files 1040, and graphics files 1042 stored in memorv 1030. Thus, the central processing unit 1020 retrieves the document associated with the URL from memory 1030. The audio h,~ ling/browsing process 1022 then intel~lt;ls the docurnent in accordance withthe audio browsing techniques ofthe present invention. The audio res]llting from the hlLe.~r~l~tion of the docllment is provided to the user under control of the audio processimg process 1026. Similarly, a user of telephone 110 can provide audio user input to the audio blo~ lg document server 900 in a manner similar to that described above in conjunction with the embodiment of Fig. 1.
Since the audio illt~ g/browsing process 1022, the docum~nt serving process 1024, and the audio proces~ing process 1026, are co-resident in the audio browsing document server 900, all communications between the processes 1022, 1024, 1026, takes place through the celltral processing unit 1020 via inter-process communication, and all co~ ication between the processes 1022, 1024, 1026, and other çlem~ont~ of the audio bl'UW:~ing docl-m~nt server 900 takes place via thecontrol/data bus 1004. Orle advantage of this embodiment is efficiency, in that HTML
documents and other data does not need to traverse a potentially unreliable wide-area network in order to be processed ~e.g. inte~ et~d~.
In the embodiment shown in Fig. 1, the audio processing node 152 and the audio inte~ tel node 154 were collocated. However, the fimctions of the audio W ~ 97/40611 PCT~US97/03690 processing node 152 and the audio in~ Lc;l~l node 154 may be geographically separated as shown in Fig. 11. In such an embodiment, the audio proc~oe~in~ node 152 is contained within the telecomrnunications network 102 and an audio inl~
docurnent server 1100 is co~t~inP-1 within the packet network 162. The functioning of S the audio processing node 152 is as described above in conjunction with the embodiment of Fig. 1. The audio interpreter document server 1100, which implements the functions of a document server, such as document server 160, and the functions ofthe audio int~ ,. node 154, is shown in further detail in Fig. 12. The audio illt~ et~l document server 1100 contains a packet net~,vork intl?rf~l~e 1202 connecte~l to link 153 and to a control/data bus 1204. The audio inle~ ,t~_l document server 1100 contains a central processing unit 1206 and a memory unit 1212, both of which are conn~ct~1 to the control/data bus 1204. The overall functioning of theaudio interpreter document server 1100 is controlled by the central processing unit 1206. Central processing unit 1206 operates under control of executed computer program instructions 1214 which are stored in memory unit 1212. Memory unit 1212also contains data 1216, HTML doc~-m~nt~ 1218, audio-HTML docllm~nt~ 1220, audio files 1222, and graphics files 1224.
The audio interpreter document server 1100 implements the functions of the audio h.l~ ;ler node 154 and the document se~ver 160, which were described abovein conjunction with the embodiment of Fig. 1. These functions are implementerl by the central proGe~sin~ unit 1206 executing computer program instructions 1214.
Thus, the computer program instructions 1214 would include program instructions which are the same as, or simiiar to: 1) co~ u~ program instructions 312 impl~m~ntin~ the functions of the audio hlt~ ,l node 154; and 2) collllJuLl;l program instructions 416 impl~m~nting the functions ofthe document server 160.
The functioning of the audio i~ c;l~. node 154 and the document server 160 were described in detail above, and will not be described in further detail here. Central processing unit 1206 is capable of executing multiple processes at the same time, and in this way ;mplements the functions of the audio i~ etcl node 154 and the document server 160. This multiprocess functioning is ~ str~te~l in Fig. 12 where the Z~

W O 97/40611 PCTrUS97/03690 central processing unit 1206 is shown executing audio inl~L~ lg/browsing proeess1208 and document serving process 1210.
In operation, the audio proeessing node 152 communicates with the audio inlcl~reLc- docurnent server 1100 over link 153 in a manner similar to that described S above in conjunetion with Fig. 1. However, unlilce Fig. 1, in which the audio JleL~l node 154 commllni~ted with the document server via link 164, the audio interpreter browsing proeess 1208 cornmunicates with the document serving process 1210 through the eentral proc~s~ing unit 1206 via inter-proeess communieation.
Thus, as described above, the audio browsing aspects of the present invention may be implemented in various ways, such that the audio proeessing funetions, the audio int~ cling/browsing functions, and the document serving functions, may be integrated or Sc~lLdtc, depending on the partieular eonfiguration. One skilled in the art would reeognize that there are other possible eonfigurations for providing the audio browsing functions of the present invention.
As can be seen from the above description, the present invention may be used in conjunction with standard HTML doc-lm~nt~ which are generally int~n(1ed to beused with conventional graphics browsers, or with audio-HTML documents which arecreated specifically for use in accordance with the audio browsing features of the present invention.
With respect to the audio hlle~ Lion of standard HTML do~ llr.ll~i, many standard text to speech conversion techniques may be used. The following sectiondescribes the techniques which may be used to convert standard HTML docnment~
into audio data. The techniques described herein for converting HTML documents into audio data are exemplary only, and various other techniques for collv~llhlgHTML doc~lm~nt~ into audio signals could be readily implementt~d by one skilled in the art given this disclosure.
Standard text p~c~s are int~,.L,l~ted using conventional text to speech conversion teehniques which are well lcnown. The text is in~ lclcd as it is encountered in the documçnt, and such interpretation continues until the user supplies audio input (e.g. to answer a prompt or follow a link), or a prompt is reached in the .

Z ~

W ~97/40611 PCTrUS97/03690 docurnent. The end of a s~nt~n~e is illt~ ted by adding a pause to the audio, and paragraph marks <p> are hll~ eled by inserting a longer pause. Text styles may be interpreted as follows.

STYLE GENERATED AUDIO
<EM>text~/EM Read text with increased volurne <CITE>text</CITE> Read text as an independent unit (e.g. using inflection and setting off with pauses).
<DFN>word<lDFN> Read text as an independent unit ~e.g. using inflection and setting off with pauses).
<CODE~computer code</CODE> Read punctuation literally and spell out identifiers. If the language of the conl~.uh,.
code can be ~eterrnin~l then special reading modes might be applied. For example, C functions might be identified as ~;uch.
<KBD>text</KBD> ~ead text as usual.
<~AMP>text</SAMP> ~eac text as usual.
<' TRONG>text</STRONG> Reac text at higher volume.
<VAR>variablename<lVAR> Reac variable using a dir~~ t voice.

Image instructions are specifications in HTML which indicate that a particular image is to be inserted into the document. An example of an HTML image instruction is as follows:
qMG SRC="http://mq~hint~ ~tt corn/im~ge gif" ALT="[image of car]">

This instruction indicates that the image file "image.gif" is to be retrieved from the mz~-~hine defined in the URL and displayed by the client browser. ~ertain conventional graphic browsers do not support image files, and therefore, HTML
image instructions some~imes include 5~1t~ t~ text to be displayed instead of the 15 image. In the above example, the text "image of car" is included as an ~lt~ tive to the image file. In accordance with the audio browsing techniques of the present invention, if an image instruction co~ s a text alternative, then the text is processed and co~ led to speech and the speech signal is provided to the user. Thus, in this exarnple, the speech signal "image of car", would be provided to a user at telephone W O 97J40611 P~i-/Ub~1103690 110. If no text z~ltern~tive is provided, then a speech signal is generated indicating that an image with no text ~Itern~tive was encountered (e.g. "A picture without an alternative description"').
Conventional HTML contains instructions which support the entering of user ~ s input. For example, the following instructions:
<SELECT NAME = "selectvar">
<OPTION> mary <OPTION SELECTED> joe <OPTION>
I 0 </SELECT>
request that ~e user select from two options: mary or joe, with the option joe being selected as a default. In a client executing a conventional graphical browser, these options may be presented. for exarnple, in a pull down menu. In accordance with the audio browsing techniques of the present invention, the above instructions would be tr~ncl~t~(l into speech signals as follows:
"Please select onc ~7f the following: Option mary (pause) Option joe currently selected (pause) en~ of opfions. Press *r ~o repeat these options, press # to contin2~e ".

If the user presses the pound key during the pause after a given option, that option is selected. Whichever item is selected when the user chooses to continue is returned to the docurnent server associatcd with the variable selectvar. As an ~lt~ tive to the user making selections ultll D1~ signals, the usër could select the options using voice signals.
Another convention~l ~ITML instruction for entering user input is a checkbox instruction. For exarnple, th-: sequence of instructions:
<INPUT TYPE=' checkbox" NAME="varname" VALUE="red" CHECKED>
<INPUT TYPE-"checkbox" NAME="varname" VALUE="blue">
<INPUT TYPE="checkbox" NAME="varnarne" VALUF="green">

W ~ 97140611 PCTrUS97/03690 would result in the following being displayed by a conventional graphics browser:

red El blue ~1 green O

The default is that the red box is checked. The user would be able to change this default by checking either the blue or green box. In accordance with the audio browsing techniques of the present invention, the above sequence of instructionswould be processed into a speech signal provided to the user as follows:
The following selections may ~e toggled by pressing # during the pause: red currently checked (pause), blue ~pause), green (pause).
Press *r to repeat ~*is list or # to continue.

By pressing the # key to gelle.dlt; a DTMF signal during a pause, the user can toggle the item prece-1in~ the pause. A second press of the # key will move the user out of the input sequence. The user may press *r to repeat the list of options. As an alternative to DTMF audio input, the user may select the checkbox options using voice signal input.
Conventional HTML docurnents can request user textual input using a TEXTAREA instruction as follows:
<TEXTAREA COLS=60 ROWS=4 NAME~'textvar"> Add text here <lTEXTAREA>
which, in a conventional graphics browser, would result in the text "Add text here"
being displayed followed by a text box of 60 columns by 4 rows being presented to the user for textual input. In accordance with the audio browsing techniques of the 3~

CA 022247l2 l997-l2-l5 W O 97/40611 PCT~US97103690 present invention, the above instruction would be inLe~ d as follows. The COL
and ROWS p~ncters are ignored, and the user is provided with audio:
"Add text here".
The user could then enter DTMF tones followed by the # sign. These DTMF signals 5 would be processed with the results being supplied to the document server associated with the variable "textvar". Alternatively, the user could supply the text by speaking the response into the microphone of telephone 1 10 and the speech is converted into data by the speech recognition module 214 and the data is supplied to the document server 160 associated with the variable L~lv*l".
As seen from the above, various techniques can be used such that conventional HTML documents can be browsed in accordance with the audio browsing techniques of the present invention.
In order to more fully exploit the advantages of audio browsing in accordance with the present invention, additional document instructions may be used in addition 15 to the conventional HTML instructions. These instructions, called audio-HTML
instructions, may be introduced into conventional HTML docurnents. These audio-HTML instructions are described below.
A voice source instruction:
<VOICE SRC="//www.abc.com/audio.file">
20 results in the specified file being played to the user. Such an instruction was described in detail in conjunction with line 512 of the example document 500 of Fig.
5.
A collect name instruction:
<COLLECT NAME="collectvar">
25 specifies the beginnin~ of a prompt-and-collect sequence. Such a collect nameinstruction is followed by a prompt instruction and a set of choice instructions. When the user makes a choice, as indicated by audio user input, the results of the user choice are supplied to the documents server associated with the variable collectvar. The collect name instruction, along with an associated prompt-and-collect sequence, is 3~

CA 022247l2 l997-l2-l5 V~O 97/40611 PCTrUS97/03690 described in detail in conjunction with the lines 514-524 of the example document 500 of Fig. 5.
A DTMF input instruction:
<rNPUT TYPE="DTMF" MAXLENGTH--"5" NAME=varname>
5 indicates that audio user input in the form of DTMF signals is expectt-~l from the user.
This instruction causes the audio browsing adjunct 150 to pause and wait for DTMF
input from the user. The user inputs a DTMF sequence by pressing keys on the keypad of telephone 1 10 with the end of the sequence indicated by pressing by the #
key. The DTMF input is processed as described above in conjunction the example 10 HTML document 500. The decoded DTMF signal is then supplied to the document server associated with the variable v~rn~m~. The MAXLENGTH parameter in-li the m~xi~ length (DTMF inputs) that are allowed for the input. If the user enters more than the maximum number of DTMF keys (in this exarnple 5), then the system ignores the excess input.
In a similar manner, the SPEECH input instruction:
<INPUT TYPE="SPEECH" MAXLENGTH="5" NAME=varnarne>
indicates that audio user input in the form of a speech signal is expected from the user.
This instruction causes the audio l~low~ing adjunct 150 to pause and to wait forDTM~ speech input from the user. The user inputs a speech signal by speaking into the microphone of telephone 1 10. The speech input is processed as described above in conjunction with the example HTML document 500. The speech signal is then supplied to the document server associated with the variable vz~m~me. The h~AXLENGTH pa~ ,.tcr ~n~ti~cs that the mi1x;~ length of the speech input is 5 seconds.
The audio-HT~lL instructions described herein are exemplary of the types of audio-HTML instructions ~ hich may be implem~nt~d to exploit the advantages of the audio browsing techniques of the present invention. Additional audio-HTML
instructions could be readily implrm~nt.-cl by one skilled in the art given thisdisclosure.

W O 97/40~11 PCTAJS97/03690 In addition to the above described audio-HTML instructions, the audio browsing adjunct 150 :~U~J~U~ i various navigation instructions. In conventionalgraphic browsers, users may use conventional techniques for navigating through adocument. Such conventional techni~ues include text sliders for scrolling throu~h a 5 document, cursor movement, and instructions such as page up, page down, home, and end. In accordance with the audio b lVW::iillg techniques of the present invention, users may navigate through document~ using audio user input, either in the form of DTMF
tones or speech, as follows.

DTMF SPEEICH NAVIGATIONRESPONSE
COMMAND COMMAND
*8 Top Jump to be~innin~ of docnment *3 End Jump to end of document *6 Next Jump to beginnin~ of next prompt sequence *7 Skip Jump to next option, link, definition or other list item *5 List List all links within a document with a pause following each link allowing user to specify a selection of the link.

The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be ~letermined from the Detailed Description, but rather from 15 the claims as h~le~ ;led according to the full breadth permittecl by the patent laws. It is to be understood that the embo-1iment~ shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implem~nt~l by those skilled in the art without departing from the scope and spirit of the invention.
20 For example, although certain of the cornmunication ~~.h~nnel~ have been described herein as packet :iwi~ cd communication channels, such communications ch~nnel.
could also be implem~ntecl as circuit switched col,.,llunication channels.

Claims

What is claimed is:

1. A method for providing audio access to information stored at a server comprising the steps of:
establishing an audio channel between an audio interface device and a telecommunications network node;
establishing a document serving protocol channel between said telecommunications network node and said server;
receiving a document at said telecommunications network node from said server via said document serving protocol channel;
interpreting said received document into audio data at said telecommunications network node; and transmitting said audio data from said telecommunications network node to said audio interface device via said audio channel.

2. The method of claim 1 wherein said audio interface device is a telephone, said step of establishing an audio channel further comprising the steps of:
receiving a telephone call placed to a telephone number associated with said server;
routing said telephone call to said telecommunications network node.

3. The method of claim 1 wherein said server is a WWW document server and wherein said document serving protocol is hypertext transfer protocol.

4. The method of claim 1 wherein said document includes HTML
instructions.

5. The method of claim 4 wherein said document further comprises audio-HTML instructions.

6. The method of claim 1 further comprising the steps of:
receiving at said telecommunications network node audio user input from said audio interface device via said audio channel;
interpreting said audio user input at said telecommunications network node into user data appropriate for transmitting via said document serving protocol; and transmitting said user data to said server via said document serving protocol channel.

7. The method of claim 6 wherein said audio user input is DTMF tones.

8. The method of claim 6 wherein said audio user input is speech signals.

9. A system for accessing information stored at a server comprising:
a telecommunications network node for receiving a call placed from an audio interface device to a telephone number associated with said server, wherein an audio channel is established between said telecommunications network node and said audio interface device;
a database accessible by said telecommunications network node for associating said telephone number with said server;
means associated with said telecommunications network node for establishing a document serving protocol channel between said telecommunications network nodeand said server;

an interpreter associated with said telecommunications network node for interpreting a document received from said server via said document serving protocol channel into audio data, and means associated with said telecommunications network node for transmitting said audio data to said audio interface device via said audio channel.

10. The system of claim 9 wherein said audio interface device is a telephone.

11. The system of claim 9 wherein:
said interpreter is further configured to interpret audio user input received from said audio interface device via said audio channel into user data appropriate for transmission via said document serving protocol; and said system further comprising means for transmitting said user data to said server via said document serving protocol channel.

12. The system of claim 11 wherein said audio user input is DTMF tones.

13. The system of claim 11 wherein said audio user input is speech signals.

14. The system of claim 9 wherein said server is a WWW document server and wherein said document serving protocol is hypertext transfer protocol.

15. The system of claim 9 wherein said document includes HTML
instructions.

16. The system of claim 15 wherein said document further comprises audio-HTML instructions.

17. The system of claim 9 wherein said database comprises data associating telephone numbers with Uniform Resource Locators.

18. A method for providing audio access to information stored at a server which serves document in accordance with a document serving protocol, said method comprising the steps of:
establishing a communication channel between an audio interface device and said server;
interpreting documents provided by said server into audio data; and providing said audio data to said audio interface device.

19. The method of claim 18 wherein said step of interpreting takes place at said server.

20. The method of claim 19 wherein said document serving protocol is hypertext transfer protocol.

21. The method of claim 18 wherein said step of interpreting takes place at said audio user protocol.

22. The method of claim 18 wherein said step of interpreting takes place at an intermediate node in said communication channel disposed between said server andsaid audio user interface.

23. The method of claim 18 wherein said document serving protocol is hypertext transfer protocol.

24. The method of claim 18 further comprising the steps of:
interpreting audio user input received from said audio interface device into instructions compatible with said document serving protocol; and providing said instructions to said server.

25. A system for interpreting information between a server operating in accordance with a document serving protocol and an audio interface device, wherein said server and said audio interface device are connected by a communications channel, said system comprising:
means for receiving a document served by said server via said document serving protocol;
an interpreter for interpreting said received document into audio data; and means for providing said audio data to said audio interface device.

26. The system of claim 25 wherein said audio interface device is a telephone, said system further comprising means for establishing said communication channel, said means for establishing said communication channel comprising:
means for receiving a telephone call placed from said telephone to a telephone number associated with said server; and a database for associating said telephone number with said server.

27. The system of claim 25 wherein said interpreter is located at a node disposed between said audio interface device and said server within said communication channel.

28. The system of claim 25 wherein said interpreter is located within said document server.

29. The system of claim 25 wherein said interpreter is located within said audio interface device.

30. The system of claim 25 wherein:
said interpreter is further configured to interpret audio user input received from said audio interface device into instructions appropriate for transmittal in accordance with said document serving protocol; and said system further comprising means for providing said instructions to said document server.

31. A document server for providing audio access to stored documents comprising:
an interface for connection with a communication link, said communication link providing communication with an audio interface device;
a machine readable storage device storing computer program instructions and said documents;
a central processing unit connected to said memory and said interface for executing said computer program instructions, said computer program instructionscausing the central processing unit to perform the steps of:
in response to receipt of a request for a document, retrieving said requested document from said machine readable storage device in accordance with a document serving protocol;
interpreting said requested document into audio data; and transmitting said audio data to said audio interface device via said interface.

32. The document server of claim 31 wherein said document serving protocol is hypertext transfer protocol

33. The document server of claim 31 wherein said communication link is a telephone network connection, said document server further comprising:
a telephone network interface.

34. The document server of claim 31 wherein said communication link is a packet network connection, said document server further comprising:
a packet network interface.

35. The document server of claim 31 wherein said computer program instructions further cause the central processing unit to perform the steps of:
in response to audio user input received from said audio interface device via said communication link, interpreting said audio user input into user data; and in response to said user data, retrieving a document from said machine readable storage device in accordance with a document serving protocol.