EP1952629A1

EP1952629A1 - Method and apparatus for synchronizing visual and voice data in dab/dmb service system

Info

Publication number: EP1952629A1
Application number: EP06823659A
Authority: EP
Inventors: Bong-Ho Lee; So-Ra Park; Hee-Jeong Kim; Kyu-Tae Yang; Chung-Hyun Ahn; Soo-In Lee
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2005-11-21
Filing date: 2006-11-21
Publication date: 2008-08-06
Also published as: KR100862611B1; EP1952629A4; WO2007058517A1; KR20070053627A

Abstract

Provided is a digital broadcasting service, more particularly, a method for providing a Web service that can simultaneously input/output speech data along with visual data by integrating speech Web data with broadcasting Web sites (BWS) provided for a multimedia broadcasting in a digital multimedia broadcasting (DMB) system configured based on a digital audio broadcasting (DAB), and an apparatus thereof . The method for synchronizing visual data with speech data includes the steps of : a) generating a visual Web document; b) generating a speech Web document including synchronization tags related to the visual Web document; and c) identifying the speech Web document and the visual Web document based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.

Description

METHOD AND APPARATUS FOR SYNCHRONIZING VISUAL AND VOICE DATA IN DAB/DMB SERVICE SYSTEM

Description Technical Field

The present invention relates to a digital data broadcasting service; and, more particularly, to a method for providing a Web service that can simultaneously input/output speech data along with visual data by integrating speech Web data with broadcasting Web sites

(BWS) provided for a multimedia broadcasting in a Digital

Multimedia Broadcasting (DMB) system configured based on a Digital Audio Broadcasting (DAB), and an apparatus thereof.

Background Art

Conventional broadcasting Web sites (BWS) providing methods make Hyper Text Markup Language (HTML) contents, which is of a Web specification, data and transmit the data to provide a Web service on a screen by using a multimedia object transfer (MOT) method through a Digital Multimedia Broadcasting (DMB) network configured based on Digital Audio Broadcasting (DAB), and an apparatus thereof. The method, however, can simply output the Web data defined by the HTML onto the screen. Therefore, the method cannot sufficiently transfer data in a broadcasting system for a mobile environment, such as a DAB-based DMB. Also, an X + V method is underway for standardization and development to provide a multi-modal Web service. It is a method directed through a screen and it can provide a multi-modal Web service that can input/output speech data by combining the host language XHTML with forms in charge of speech interface of VoiceXML. However, the method, too, operates based on a visual interface with the XHTML as a host language, and it is somewhat inappropriate for a mobile environment.

The present invention provides a method for synchronizing visual and speech Web data that can overcome the aforementioned drawbacks and provide users with a speech-directed Web service in a mobile environment or a fixed location environment, instead of a visual-directed Web service, and an apparatus thereof.

Disclosure Technical Problem

It is, therefore, an object of the present invention to provide a broadcasting Web sites (BWS) service that can overcome the limitation of inputting/outputting only visual data of Web data in a conventional Digital Audio Broadcasting (DAB) BWS service or a conventional Digital Multimedia Broadcasting (DMB) BWS service and offer the speech enabled web data service, too.

In the first place, an embodiment of the present invention defines a speech-directed Web language to provide a speech-directed Web service in consideration of a mobile environment, instead of a screen-directed Web service .

Secondly, another embodiment of the present invention provides a service capable of inputting/outputting speech data by integrating a conventional Web service framework, e.g., a BWS service, with a speech input/output module.

Thirdly, yet another embodiment of the present invention provides a technology of synchronizing a content following a visual Web specification, e.g., HTML, and a VoiceXML content capable of providing a speech Web service, that is, a technology of synchronizing visual data with speech data. For this, processing of documents should be synchronized, and a user input device should be synchronized for one document. It is the object of the present invention to provide a method and apparatus for the synchronizations.

Technical Solution

In accordance with one aspect of the present invention, there is provided a method for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the method which includes the steps of: a) generating a visual Web document; b) generating a speech Web document including synchronization tags related to the visual Web document; and c) identifying the speech Web document and the visual Web document based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.

In accordance with another aspect of the present invention, there is provided an apparatus for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the apparatus which includes: a) a content data generator for generating a visual Web document and a speech Web document including synchronization tags related to the visual Web document; b) a multimedia object transfer (MOT) server for transforming both the generated visual Web document and the speech Web document into an MOT protocol; and c) a transmitting system for identifying the speech Web document and the visual Web document of the MOT protocol based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently . In accordance with another aspect of the present invention, there is provided a method for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the method which includes the steps of: a) receiving and loading a visual Web document and a speech Web document including synchronization tags related to the visual Web document, the visual Web document and the speech Web document being identified based on a sub-channel or a directory and transmitted independently; and b) analyzing the synchronization tags when a synchronization event occurs and performing a corresponding synchronization operation.

In accordance with yet another aspect of the present invention, there is provided an apparatus for synchronizing visual data with speech data to provide a BWS service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, the apparatus which includes: a) a baseband receiver for receiving broadcasting signals through a multimedia broadcasting network and performing channel decoding; b) a multimedia object transfer (MOT) decoder for decoding channel-decoded packets and restoring a visual Web document and a speech Web document including synchronization tags related to the visual Web document; and c) an integrated Web browser for analyzing the synchronization tag when a synchronization event occurs and executing a corresponding synchronization operation.

Advantageous Effects When an HTML document, which is a visual Web document, is synchronized with a VoiceXML content, which is a speech Web document, by using the synchronization tags and it is possible to perform synchronized input/output, a multimedia broadcasting service user can conveniently access to corresponding information by receiving both screen output and speech output for a Web data service and, if necessary, making a command by speech even in a mobile environment.

In other words, the present invention has an advantage that it can ensure the backward compatibility with lower-ranked services by individually authoring and transmitting data to provide an integrated synchronization service, instead of integrating markup languages and transmitting them in the form of data of a sort, which is generally used. To synchronize two Web documents, the technology of the present invention adds synchronization-related elements to a host markup language to thereby maintain a conventional service framework. Thus, users can receive a conventional broadcasting Web site and, at the same time, access to the Web by speech, listen to information, and control the Web by speech.

Description of Drawings

The above and other objects and features of the present invention will become apparent from the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which:

Fig. 1 is an exemplary view illustrating how broadcasting Web site documents are authored to be synchronized and capable of speech input/output in accordance with an embodiment of the present invention; Fig. 2 is a view describing broadcasting Web site documents capable of speech input/output and a data transmitting method in accordance with an embodiment of the present invention; Fig. 3 is an exemplary view showing broadcasting Web site documents capable of speech input/output when a synchronization document is separately provided in accordance with an embodiment of the present invention;

Fig. 4 is a block view describing a Digital Multimedia Broadcasting (DMB) system which is configured based on a Digital Audio Broadcasting (DAB) and providing a broadcasting Web sites (BWS) service capable of simultaneous speech input/output; and

Fig. 5 is a block view illustrating an integrated Web browser of Fig. 4.

Best Mode for the Invention

To have a look at conventional technologies, broadcasting Web sites (BWS) defined to provide a Web service in a multimedia broadcasting service, such as Digital Audio Broadcasting (DAB) and Digital Multimedia Broadcasting (DMB), is a specification for providing users with a Web site by authoring a Web content based on Hyper Text Markup Language (HTML) and transmitting the Web content through a multimedia broadcasting network. The Web language that becomes the basis for providing the service includes a basic profile which adopts HTML 3.2 as a Web specification in consideration of a terminal with a relatively low specification, and a non-restrictive profile which has no restriction in consideration of a high-specification terminal, such as a personal computer (PC). Since the profiles are based on the HTML, which is a Web representation language, it requires a Web browser to provide a terminal with a BWS service. The browser may be called a BWS browser and it provides a Web service by receiving and decoding Web contents of txt, html, jpg, and png formats transmitted as objects through a multimedia object transfer (MOT) . Generally, the output is provided in the visual form. That is, texts or still images are displayed on a screen with a hyperlink function and they transit into the other contents transmitted together through the MOT to thereby provide a visual-based local Web service. Of course, when the specification includes a function of recovering a speech file or other multimedia files, it is possible to provide the output not only on the screen but also by speech. However, with the current specification, which is the basic profile, it is possible to provide only a local Web service capable of visual input/output based on the Graphical User Interface (GUI).

Particularly, there are increasing demands for services that can provide a data service additionally while securing mobility in a mobile multimedia broadcasting such as the DAB and DMB and access to multimodal information, instead of a single-modal information. The World Wide Web Consortium (W3C) has completed developing VoiceXML and SALT specifications capable of providing speech-based Web services, and it is expected to embark in standardization of multi-modal Web specification additionally.

VoiceXML is a Web language devised for an interactive speech response service of an Interactive Voice Response (IVR) type. When it is actually mounted on the terminal, it can provide a speech enabled Web service. The technology defines a markup language that can be transited into another application, document, or dialogue based on a dialogue obtained by modeling a conversation between a human being and a machine. Differently from the conventional visual-based Web service, the VoiceXML can provide a Web service that can input/output data by speech. Also, Web information can be delivered by speech by applying a Text To Speech (TTS) technology which transforms text data into speech data and the Automatic Speech Recognition (ASR) technology which performs speech recognition to an input/output module, and user input data are received by speech to process a corresponding command or execute a corresponding application. The VoiceXML is effective in a mobile environment. It has an advantage that users listen to a Web service provided without a visual output on the screen and perform navigation by inputting speech data at desired information. However, there is a limitation in delivering Web information by speech only and, when speech input/output is made together with visual data on the screen, it is convenient and it is possible to provide diverse additional data services. For this, the present invention provides a transmission and synchronization method for providing a multi-modal Web service by integrating the conventional BWS Web specification, i.e., HTML, with a speech Web language, i.e., VoiceXML. Hereinafter, the transmission and synchronization method will be described.

The basic principle of the present invention is to generate a speech Web document including synchronization information related to a visual Web document and transmit the visual Web document and the speech Web document through another sub-channel or another directory of the same sub-channel.

Fig. 1 is an exemplary view illustrating how of broadcasting Web site documents are authored to be synchronized and capable of speech input/output in accordance with an embodiment of the present invention. As shown in Fig. 1, a visual Web document and a speech Web document are separately created in the embodiment of the present invention. The visual Web document is an HTML or an xHTML content defined in the BWS, whereas the speech Web document is a document integrating elements or tags in charge of synchronization between the VoiceXML and the visual Web documents, a speech recognition module, and a component-related module such as a speech combiner and a receiver. Fig. 2 is a view describing broadcasting Web site documents capable of speech input/output and a data transmitting method in accordance with an embodiment of the present invention.

Referring to Fig. 2, although the visual Web document and the speech Web document are transmitted and signaled through different sub-channels or the same subchannel, they are transmitted using different directories, This is to make a terminal capable of receiving an existing BWS service receive the conventional service, even if the BWS is cooperated with a speech Web document. The signaling for the speech BWS is additionally processed in the speech Web document, i.e., a speech module.

The synchronization between the visual Web document and the speech Web document is processed by using synchronization tags <esync>, <icync> and <fsync>. The synchronization tags are described in the speech Web document without exception. Also, the synchronization tags are identified by the following namespace.

<va version = "1.0" xmlns:va="http://www. worlddab.org/schemes/va" xsi:schemaLocation="http://www. worlddab.org/schemas/va va.xsd"> Also, an HTML forming a BWS document uses the following namespace.

HTMUIDOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"

"http://www.w3.org/TR/HTML32.dtd"> b

A VoiceXML forming a speech Web document has the following name space.

IQ <vxml version = "2.0" xmlns= "http://www.w3.org/2OOl /vxml" xmlns :xsi= http://www.w3.org/2001 /XMLSchema-instance xsi :schemaLocation= "http://www.w3.org/2001 /vxml> http://www.w3/org/TR/voicexml20/vxml.xsd">

15 To take an example, the entire namespace including the visual Web document and the speech Web document may be designated as follows:

<?xml version ="1.0" encoding ="UTF-8"> <vxml version ="2.O' xmlns = "http"//www.w3.org/2001 /vxml

„₍, xmlns = "http://www.w3.org/20OI /XMLSchema-instance xmlns = "http://www.worlddab.org/schemas/va" xsi :schemaLocation =

"http://www.w3.org/2OOl /vxml http7/www .w3.org/TR/voicexml20/vxml/sxd http://www.worlddab.org/schemas/va http://www.worlddab.org/schemas/va/va.xsd">

25 The synchronization tags for processing synchronization between the visual Web document, i.e., HTML, and the speech Web document, i.e., the VoiceXML, should describe synchronization between an application, documents, and forms within the document. The

30 synchronization tags used for the purpose are <esync>, <isync> and <fsync>. The tags <esync> and <isync> are in charge of synchronization between an application and documents, whereas the <fsync> tag is in charge of synchronization between forms. Herein, the synchronization between the application and the documents should be simultaneously loaded, interpreted and rendered in the initial period when the application starts. The synchronization between the forms signifies that user input data are simultaneously inputted to a counterpart form.

The <esync> tag is used to describe synchronization information between applications or between documents, when synchronization related information, i.e., <esync>, and related attributes do not exist in the speech Web document but exist in an independent external document, e.g., a document with an extension name of '.sync'. The <esync> tag supports the synchronization function based on the attributes shown in the following Table 1.

This designates a URI for a speech part document to be synchronized Also it vadoc specifies a form of VoiceXML as follows : e.g. ,

" . /ensemble.vxml#sndform".

This is a URI for a BWS part document to be bwsdoc synchronized.

The external document synchronization using the <esync> tag requires metadata which provide the speech Web document with information on the external synchronization document. For the metadata, used is a <metasync> tag having attributes defined as shown in Table 2.

Table 2

The <metasync> tag should be positioned in the speech Web document and it provides metadata to the <esync> tags stored in the external document. The entire operation mechanism is as shown in Fig. 3. That is, the synchronization document and the related <esync> tags are interpreted through the <metasync> tag described in the speech Web document and then the related BWS document is simultaneously loaded and rendered.

The <isync> tag indicates a synchronization method of a document. Differently from the <esync> tag, it is not authored in a separate document but it is formed by directly describing related synchronization tags within a predetermined form. Herein, the form includes a <form> tag and a <menu> tag of a VoiceXML, in a speech Web document. This is to support synchronization occurring when a predetermined form of the speech Web document should be synchronized with a BWS Web document and when a predetermined document needs to be transited.

According to an example, when there are a plurality of forms in on speech Web document and each form requires BWS documents having multiple pages and a synchronized operation with the BWS documents, it can be resolved by describing related <isync> tags in each form. Actually, the tag <isync> may be described in tags <link> or <goto> of the VoiceXML and secure synchronized transit.

Therefore, when a specific speech Web document is stacked and no specific form is designated, synchronization is processed with reference to the synchronization document providing the initial <esync>. When a related tag <isync> is designated to a specific form, the tag <isync> has a priority to be synchronized with the BWS Web document. For the synchronization, another <isync> tag is used, which is shown in the following Table. When the <isync> tag is defined in the <link> or <goto> tag, the BWS Web document should be transited according to the definition of the <isync> tag. When the form has a designated synchronization, there should be only one <isync> related to this form. However, when synchronization is specified in a form, there may be a plurality of transit tags. In other words, when synchronization is described with the tag <goto> or <link>, there may be a plurality of <isync> tags. The <isync> tags should be necessarily described in the <goto> or <link>, and synchronization information only affects to the transition process. For this, an attribute ^λtype' is supported.

The attributes of the <isync> tag for realizing synchronization based on the tag <isync> tag are as shown in the following Table 3

Table 3

The following shows an example that shows synchronization of an application document authored by using the tags <esync> and <isync>.

<?xml version ="1.0" encoding = "UTF-8">

<va:metasync doc="main.esync" syncid ~"#service_main" />

In this data service, you can get various information dedicated to life in general such as local hot news, local transportation, shopping and so on. Are you ready for surfing this service? </bhck> </form>

<form id = "hotnews_intro"> <va:isync> id = "hotnews jntro sync" type ="form" next= "hotnews_mtro.html" </va:isync>

<fιeld name ="move_to_news_page">

<item> news </item> <item> local news </item> <item> hot news </item> <item> headline </item> </one-op' </rule> </grammar>

<prompt> In this service, headline news and breaking news are provided.

Do you want to move? </prompt~> <fllled>

<va:isync type = "transit" next = "../../hotnews. html" /> </goto> </if> </filled>

<catch event — "noinput"

<form id = "game_intrυ"> <va:isync> id = "game_intro_sync" type ="form" next="game _inlro.html" </va:isync>

< field name ="move to_game_page">

<prompt> In this service, voice quiz and multimodal games are provided.

Do you want to play? </prompt> <filled>

<va:isync type = "transit" next ="../../games.html"/> </goto> </if> </flIled> </βeld> </foπn>

When the document is executed in the above example, a ^Xλservice_main_intro" dialogue is outputted by speech and, at the same time, a corresponding html page which affects the entire document, for example, a main page of the entire service, is synchronized based on the

<metasync> tag and rendered onto a screen. Herein, since bargine is not permitted, user input data are not processed and the screen is maintained until the next

^■dialogue is executed. When a "hotnews_intro" dialogue is executed, a BWS Web document corresponding to the

"hotnews_intro" is automatically synchronized based on the "hotnews_intro_sync" <isync>, and eventually loaded and rendered. The <fsync> tag is needed for inter-form synchronization between the speech Web document and the BWS Web document and it processes the user input data. The concept of the <fsync> tag is similar to document synchronization. It signifies that, when the user input data are processed through speech recognition of the speech Web document, the processed user input data are transferred to and reflected in a <input> tag of the BWS. Conversely, when data are inputted from a user on the BWS, the content of the user input data is reflected in a <field> tag of the speech Web document. For this, an input function should be provided to the two-party- modules, and synchronization mechanism should be accompanied. The <fsync> tag is a sort of executable contents. It may be positioned within the <form> of the VoiceXML or it may exist independently. If any, the scope of the <fsync> tag is limited to a document. When the <fsync> has a global scope over all documents, it should be specified in a root document. Then, it may be activated in all documents.

When the ^λfield' attribute is not specified, it means that the <field> of a corresponding form is an object to be synchronized. In this case, the form should have only one <field> tag. If there are a plurality of <field> tags in one form, its attributes should be specified necessarily. Also, each field should have a unique name.

The <fsync> tag has the attributes shown in the following Table 4 and it should be in charge of synchronization between forms.

Table 4

Attribute Function field This signifies a <field> name of a VoiceXML

Also, the following conditions should be satisfied to achieve the form synchronization.

Speech data input from the user should be updated in the <field> tag of VoiceXML and <input> tag of the BWS HTML.

Visual data inputted from the user, such as data input through a keyboard or a pen, should be updated in the <input> tag of HTML and the <field> tag of VoiceXML simultaneously.

Visual data inputted from the user should satisfy a guard condition of the <field> tag of VoiceXML.

The <field> tag of VoiceXML should be matched one- to-one with the <input> tag of HTML in the moment when the inputted data are about to be reflected.

The form synchronization should be carried out in parallel to the document synchronization. That is, the <field> or <input> tag to be synchronized may be validly updated only in a document already synchronized. In short, the tags of the two modules, which should receive the inputted data, should mutually exist in the synchronized document. When they are described on the external document, only the 'root' document is allowed for general synchronization. The data should be mutually inputted only in the form of an activated speech Web document. In short, there are a plurality of <input> tags in one BWS document and, when data are inputted into an <input> tag which is not linked with a <form> tag activated in the speech Web document, update into the <form> currently activated in the speech Web document is prohibited. For example, when a form corresponding to a speech output dictating to input a card number is executed and a valid date field is requested to be filled into an <input> of the BWS Web document, the mixed initiative form operation is prohibited. An example of the form synchronization is as follows .

10

<!— Voice Part -> <form id = "gasstationjsync"> <va:isync> id = "id_gas" type ^"document" next= "lbs_service.html"

</ va:isync>

If you say your city name, you can get your gas station information. </prompt>

This service is only available in Seoul, Daejeon, Kwangju, Taegue, Busan, lncheon. So please say above cities. </prompt> </catch~>

<filled> Jocation" /> method = "post" namelist =^■ "your Jocation" /> </fllled> </fieldϊ- </fbrm>

<head>

<title> Location based service </title> </head> <body> <->3 <hl> Local gas station information </hl>

<p> If you type the name of your city, you can get some gas station information </p>

</form>

</body> When the <fsync> tag is specified as shown in the above example, values are simultaneously inputted from the speech module to the <input> tags of the BWS Web document. Fig. 4 is a block view describing a Digital Multimedia Broadcasting (DMB) system providing a broadcasting Web sites (BWS) service capable of simultaneous speech input/output.

Referring to Fig. 4, the DMB system for providing speech-based BWS service capable of simultaneous speech input/output can be divided into a DMB transmission part and a DMB reception part based on a DAB system. The DMB transmission part includes a content data generator 110, a multimedia object transfer server (MOT) 120, and a DMB transmitting system 130. The content data generator 110 generates speech contents (speech Web documents) and BWS contents (visual Web documents). The MOT server 120 transforms the directory and file objects of the speech contents and BWS contents into MOT protocols before they are transmitted.

The DMB transmitting system 130 multiplexes the respective MOT data of the transformed MOT protocol, which include both speech Web documents and visual Web documents, with different directory of the same sub- channel or different sub-channels and broadcasts them through a DMB broadcasting network. The present invention, however, is not limited to them and a speech Web document and a visual Web document may be generated in an external device and transmitted from the external device.

The DMB broadcasting reception part, i.e., the DMB receiving block 200, includes a DMB baseband receiver 210, an MOT decoder 220, and a DMB integrated Web browser 230. The DMB baseband receiver 210 receives DMB broadcasting signals from the DMB broadcasting network based on the DAB system, performs decoding for corresponding subchannels, and outputs data of the respective sub-channels. The MOT decoder 220 decodes packets transmitted from the DMB baseband receiver 210 and restores MOT objects. The DMB integrated Web browser 230 executes the restored MOT objects, which include directories and files, independently or based on a corresponding synchronization method. In the present invention, the restored objects includes visual Web documents and speech Web documents related to the visual Web documents, and the DMB integrated Web browser 230 analyzes the aforementioned synchronization tags during the generation of synchronization event and executes the synchronization function based on the synchronization tags. Fig. 5 is a block view illustrating an integrated Web browser of Fig. 4.

Referring to Fig. 5, the integrated Web browser 230 includes a speech Web browser 233, a BWS browser 235, and a synchronization management module 231. The speech Web browser 233 drives speech markup generation language extended based on VoiceXML . The BWS browser 235 drives Web pages based on the HTML. The synchronization management module 231 manages synchronization between the speech Web browser 233 and the BWS browser 235. The speech Web browser 233 sequentially drives Web pages authored in the VoiceXML-based speech markup language, outputs speech to the user, and processes user input data through a speech device. The BWS browser 235 drives the Web pages authored in the HTML language defined in the DAB/DMB specifications and displays input/output on a screen, just as commercial browsers. The synchronization management module 231 receives synchronization events generated in the speech Web browser 233 and the BWS browser 235 and synchronizes corresponding pages and forms of each page based on the pre-defined synchronization protocol (synchronization tags).

Examples of the DMB receiving block 200 include a personal digital assistant (PDA), a mobile communication terminal, and a settop box for a vehicle that can receive and restore DAB and DMB service.

As described above, the method of the present invention can be realized in a computer-readable recording medium, such as CD-ROM, RAM, ROM, floppy disks, hard disks, magneto-optical disks and the like. Since the process can be easily implemented by those skilled in the art to which the present invention pertains, further description will not be provided herein.

While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims,

Claims

What is claimed is;

1. A method for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising the steps of: a) generating a visual Web document; b) generating a speech Web document including synchronization tags related to the visual Web document; and c) identifying the speech Web document and the visual Web document based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.

2. The method as recited in claim 1, wherein the synchronization tags include an external document synchronization tag <esync> that provides synchronization information with the visual Web document which should be simultaneously loaded and rendered along with the speech Web document when an application is rendered and driven.

3. The method as recited in claim 2, wherein the synchronization tags further include a metadata tag

<metasync> that provides metadata on the external document synchronization tags <esync>.

4. The method as recited in claim 3, wherein the external document synchronization tag <esync> has attribute information defined as the following table:

Attribute Function id This is an identifier indicating an <esync> tag and referred to in a <metasync> tag.

5. The method as recited in claim 3, wherein the metadata tag <metasync> has attribute information defined as the following table:

6. The method as recited in claim 1, wherein the synchronization tags includes an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document.

7. The method as recited in claim 6, wherein the internal document synchronization tag <isync> has attribute information defined as the following table:

^yform, ' a corresponding <isync> signifies that a predetermined form is synchronized with a predetermined document. When it is ^Λtransit, ' it signifies synchronization that should be simultaneously transited to the next document. A default value is the 'form.'

8. The method as recited in claim 1, wherein the synchronization tags include an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document to secure mutual input synchronized between the form of the speech Web document and the form of the visual Web document.

9. The method as recited in claim 8, wherein the inter-form synchronization tag <fsync> has attribute information defined as the following table:

10. An apparatus for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising: a) a content data generator for generating a visual Web document and a speech Web document including synchronization tags related to the visual Web document; b) a multimedia object transfer (MOT) server for transforming the generated visual Web document and the speech Web document as well into an MOT protocol; and c) a transmitting system for identifying the speech Web document and the visual Web document of the MOT protocol based on a sub-channel or a directory and transmitting the speech Web document and the visual Web document independently.

11. The apparatus as recited in claim 10, wherein the synchronization tags include an external document synchronization tag <esync> that provides synchronization information with the visual Web document which should be simultaneously loaded and rendered along with the speech Web document when an application is authored and driven.

12. The apparatus as recited in claim 10, wherein the synchronization tags further include a metadata tag <metasync> that provides metadata on the external document synchronization tags <esync>.

13. The apparatus as recited in claim 10, wherein the synchronization tags include an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document.

14. The apparatus as recited in claim 10, wherein the synchronization tags include an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document to secure mutual input synchronized between the form of the speech Web document and the form of the visual Web document.

15. A method for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising the steps of: a) receiving and loading a visual Web document and a speech Web document including synchronization tags related to the visual Web document, the visual Web document and the speech Web document being identified based on a sub-channel or a directory and transmitted independently; and b) analyzing the synchronization tags when a synchronization event occurs and performing a corresponding synchronization operation.

16. The method as recited in claim 15, wherein a synchronization tag <metasync> that provides metadata on an external document synchronization tag <esync>, which provides the synchronization information with the visual Web document, are analyzed and the related visual Web document is loaded and rendered in the step b).

17. The method as recited in claim 16, wherein when an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document during execution of the speech input/output is designated, the internal document synchronization tag <isync> is analyzed and the related visual Web document is loaded and rendered.

18. The method as recited in claim 15, wherein when user data are inputted while the application on the speech input/output is driven, synchronization is performed by using an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document.

19. An apparatus for synchronizing visual data with speech data to provide a broadcasting Web sites (BWS) service capable of simultaneously inputting/outputting speech data in a multimedia broadcasting service, comprising: a) a baseband receiver for receiving broadcasting signals through a multimedia broadcasting network and performing channel decoding; b) a multimedia object transfer (MOT) decoder for decoding channel-decoded packets and restoring a visual Web document and a speech Web document including synchronization tags related to the visual Web document; and c) an integrated Web browser for analyzing the synchronization tag when a synchronization event occurs and executing a corresponding synchronization operation.

20. The apparatus as recited in claim 19, wherein the integrated Web browser analyzes a metadata tag

<metasync> that provides metadata on an external document synchronization tag <esync> providing synchronization information with the visual Web document, and loads and renders the related visual Web document.

21. The apparatus as recited in claim 20, wherein when an internal document synchronization tag <isync> that provides synchronization information with the visual Web document related to a corresponding form among the forms of the speech Web document during execution of the speech Web document is designated, the integrated Web browser analyzes the internal document synchronization tag <isync>, and loads and renders the related visual Web document.

22. The apparatus as recited in claim 19, wherein when user data are inputted while the application on the speech input/output is driven, the integrated Web browser performs synchronization by using an inter-form synchronization tag <fsync> that provides synchronization information between a form of the speech Web document and a form of the visual Web document.