WO2002021843A2 - System to index/summarize audio/video content - Google Patents

System to index/summarize audio/video content Download PDF

Info

Publication number
WO2002021843A2
WO2002021843A2 PCT/EP2001/009974 EP0109974W WO0221843A2 WO 2002021843 A2 WO2002021843 A2 WO 2002021843A2 EP 0109974 W EP0109974 W EP 0109974W WO 0221843 A2 WO0221843 A2 WO 0221843A2
Authority
WO
WIPO (PCT)
Prior art keywords
information
content material
production
source
ancillary
Prior art date
Application number
PCT/EP2001/009974
Other languages
French (fr)
Other versions
WO2002021843A3 (en
Inventor
Eric Cohen-Solal
Hugo Strubbe
Mi-Suen Lee
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to EP01978317A priority Critical patent/EP1393568A2/en
Priority to JP2002526124A priority patent/JP2004508776A/en
Priority to KR1020027006025A priority patent/KR20020060964A/en
Publication of WO2002021843A2 publication Critical patent/WO2002021843A2/en
Publication of WO2002021843A3 publication Critical patent/WO2002021843A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/64Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • This invention relates to the field of consumer electronics, and in particular to a system that facilitates the indexing and summarizing of audio/video content for efficient search and retrieval of select content.
  • the BNN allows the consumer to enter search words, with which the BNN sorts the story segments by the number of keywords in each story segment that match the search words. Based upon the frequency of occurrences of matching keywords, the user selects stories of interest. Similar search and retrieval techniques are becoming common in the art. For example, conventional text searching techniques can be applied to a computer based television guide, so that a person may search for a particular show title, a particular performer, shows of a particular type, and the like.
  • indexing and search techniques are also being developed for recorded entertainment material, such as audio and video CDs and DVDs. Users will be able to search, for example, for a particular scene by specifying a performer's name and a characterization of the scene, such as "monologue, John Smith". In response, the search device will present one or more scenes containing John Smith performing a monologue.
  • any closed-caption text is also used to identify and categorize scenes. Although these techniques are proving to be somewhat effective, they rely heavily on the content of the material, such as the images or voices in each scene, to determine an appropriate set of indexing parameters to characterize the material.
  • the MPEG-7 standard addresses the need for effective indexing and search capabilities, and calls for a "Multimedia Content Description Interface" that is intended to standardize a set of description schemes and descriptors, a description definition language, and a scheme for coding the description.
  • the MPEG-7 standard calls for the ability to associate descriptive information within video streams at various stages of video production, including pre- and post-production scripts, information captured or annotated during shooting, and post-production edit lists. By adding this information during the production of the video material, the quality and effectiveness of the annotations is expected to be substantially improved, compared to an after-production annotation of material in a video archive.
  • This production information includes, for example, the camera settings used during the collection of the content material.
  • Other production information includes sound controls, scene identifiers, source identifiers, instructions that are communicated to the staff that produces the content material, and so on. Instructions from a director to each camera operator, for example, can provide insight into the content of the subsequent images from the cameras. In like manner, automatically generated instructions from automated camera systems can also provide insight.
  • Other production information includes sound controls, scene identifiers, source identifiers, and so on.
  • the production information is processed and filtered to produce a collection of symbolic representations of the production information, each symbol corresponding to a determined event or characteristic.
  • This symbolic production information is preferably combined with other sources of indexing and summarizing information, to provide a set of annotations to the content material that facilitate efficient and effective selective retrieval.
  • the techniques presented herein are also particularly well suited for annotating the content material of videoconferences, to facilitate identification of key segments of a videoconference recording.
  • FIG. 1 illustrates an example production scene for gathering audio/video information.
  • FIG. 2 illustrates an example block diagram of a production recording system in accordance with this invention.
  • FIG. 3 illustrates an example block diagram of an indexing/summarizing system in accordance with this invention.
  • FIG. 4 illustrates an example block diagram of a production recorder in accordance with this invention.
  • indexing and “summarizing” are used herein to reference particular applications for this invention.
  • This invention addresses a method and device for providing information that is associated with content material, and is not limited by how that information is used. Although this provided information may be particularly well suited for use as an index to facilitate a search for material, or for use as a synopsis to facilitate a quick review or preview of the material, one of ordinary skill in the art will recognize that this invention is not limited to these particular applications.
  • FIG. 1 illustrates an example production scene for gathering audio/video information.
  • the scene includes a director 110 who directs the action of the camera operators 120, 121, as well as the performers 130, 131 and objects 140.
  • the example scene is representative of a directed scene.
  • this invention is also applicable to an undirected scene, such as the recording of a news or sports event.
  • a production director continuously determines which camera is the "on-line” camera, depending upon the current action, directs "off-line” cameras to capture specific scenes for potential selection as the "on-line” camera, or potential use for "instant replay".
  • the production director also performs a similar selection of source material for transmission.
  • the published scripts, synopses, and scene edit lists, cited above with regard to MPEG-7, may contain appropriately descriptive information, but the published information may not reflect the events as they actually happen on the production site.
  • the three example sentences, above convey the information that Joe and Jim are in the scene, that this is the scene wherein briefcases are switched, and so on.
  • This production information also conveys information that can be used in an indirect manner to facilitate an interpretation of other sources of information. For example, an image processing system will likely identify this scene as a "group" scene, and may or may not identify Joe or Jim in the group, whereas the example sentences imply that the crowd is merely "background".
  • the presence of the automobiles 140 in the scene can also be interpreted as insignificant background information, based on the absence of a reference to them in the production directives. That is, the production information not only conveys direct information regarding the content of the subsequent images, but also provides cues that facilitate efficient processing by other sources of indexing or summarizing information.
  • FIG. 2 illustrates an example block diagram of a production recording system 200 in accordance with this invention.
  • a production recorder 210 receives information from a variety of sources 220, 230, 240, and produces a database of production information 215 that captures the production information in an efficient form for subsequent processing.
  • a primary source of production information is vocal input 220.
  • a variety of vocal sources 220 typically provide production information. For example, in a news broadcast, the production booth provides a source of vocal information; the on-site reporter may give directions to an on-site camera operator; before the on-air transmission, the news coordinator may give advice to the studio newscasters, or the on-site reporter, and so on.
  • the production recorder 210 is configured to process each of these sources of information, to extract pertinent information, and to record the pertinent information for subsequent use in a retrieval process. Depending upon the complexity of the analysis, this processing may be performed in real time, as the images are being recorded, or as a post-processing task. In a preferred embodiment, the production recorder 210 also directly records the information from the sources 220-240, to facilitate a subsequent selectable degree of processing and analysis.
  • the real-time analysis of the production information for example, uses the 'primary' source of production information, such as the vocal information from a microphone associated with the director, and the other sources of production information are processed as required at a later time, for a more detailed analysis and assessment.
  • a production recording system 200 in a preferred embodiment includes input 230 corresponding to parameters associated with the individual cameras.
  • the zoom setting of a camera can be used to provide a characterization of a scene.
  • a narrow-angle, high-zoom setting typically implies a directed focus to an individual or event; or, a change of zoom setting indicates a change of emphasis on the individual or event.
  • a wide-angle, or low-zoom setting is typically associated with a "background" or "mood-setting" scene.
  • the identification of a series of images being captured at a low zoom setting can be used, for example, by an image-based categorizing system to "skip ahead" to higher zoom setting images, based on this likely correlation of significance and zoom setting.
  • each new scene may be assessed, regardless of zoom setting; if the image processor finds little or no discernible information at a particular zoom setting, it may then activate a 'skip-ahead' function. Assuming a consistency in video capture techniques, particularly within the same production, the decision to 'skip-ahead' can be made more and more quickly, thereby improving the efficiency of the image-based categorizing system.
  • the orientation, or rate of change of orientation can also be used to characterize a scene. For example, in sports, the capture of a kick-off, forward-pass, or home-run would likely involve a relatively rapid change of camera orientation, whereas an end-line-rush or strike-out would likely not involve a change of camera orientation.
  • sources 240 of production information such as the location of sound booms during recording, and other means of identifying the 'focus' of the scene can also be used to facilitate an indexing or summarizing of recorded content material.
  • the 'source' of the content material can convey information regarding the scene. For example, in a news broadcast, the identification of a scene as coming from "file footage" can be used to either minimize the processing of the scene, or provide a link to a prior processing and characterizing of this footage. Note that as production tasks become more and more automated, or at least managed via computer resources, the source of production information becomes substantial. For example, a sequencing and selection of sources during a newscast can be expected to be controlled via a computer. The capture of this information for each production can substantially increase the effectiveness and efficiency of other content indexing and summarizing tools.
  • the resultant automatically generated camera settings provide the camera input 230 to the production recorder 210, discussed above, with or without any other sources of production information.
  • copending US patent application “MULTIMODAL VIDEO TARGET ACQUISITION AND RE-DIRECTION SYSTEM AND METHOD”, serial number 09/488,028, filed 1/20/00 for Eric Cohen-Solal and Mi-Suen Lee, Attorney Docket US000015, discloses a technique and device that adjusts a camera's field of view based on gestures and key words, and is incorporated by reference herein.
  • copending US patent applications “METHOD AND APPARATUS FOR TRACKING MOVING OBJECTS USING COMBINED VIDEO AND AUDIO INFORMATION IN VIDEO
  • CONFERENCING AND OTHER APPLICATIONS discloses a technique and device that adjusts a camera's field of view based on an analysis of the video and audio content of the scene being recorded.
  • the resultant camera settings, and the gestures, speech, and movements used to cause these settings, or the analysis of the gestures, speech, or movements may be provided to the production recorder 210 for use in generating the production information database 215.
  • the camera settings during a videoconference can similarly be used to facilitate a characterization of a videoconference session.
  • Advanced videoconferencing systems are expected to include the automated and semi-automated camera control features discussed above, and even relatively simple systems allow the participants to the videoconference to adjust the camera field of view, either at their site, or at the remote site.
  • a camera operator may be provided at a central site of the videoconference, or at the site of a key speaker, and so on.
  • a stationary camera position, with perhaps some variations in zoom, for an extended time duration could be indicative of a keynote address, particularly when correlated to the audio content from each videoconference site.
  • a continuing back and forth rotation of the camera could be indicative of a key discussion period.
  • this combination of this production information with an analysis of the content material can provide insights that may not be readily apparent from the content material, and thereby improve the quality and efficiency of providing summaries for each videoconference. For example, once the camera setting corresponding to each of a plurality of speakers is determined (or explicitly provided), character identification in images becomes significantly simpler if the camera setting corresponding to the images is used to pre-filter the choices of characters that are provided to the character identification process. Similarly, the speaker identification process can be similarly enhanced by providing camera settings corresponding to each audio track. In like manner, a rapid determination that the audio track does not match the identified participant in the field of view corresponding to the current camera settings can be used to modify the current camera settings to search for the current speaker.
  • a time reference 201 is associated with the recorded production information 215.
  • some of the production information such as the camera settings 240, will be coincident in time with the scenes of the content material, while other information, such as vocal directives 220 typically precede the scenes to which they apply.
  • Knowledge based and heuristic techniques are used to determine a correlation between specific directives 220 and the content material. For example, a directive 220 that is followed with a significant adjustment of camera settings 230, or the occurrence of a 'cut' in scenes, is likely to contain information relevant to the following clip.
  • scene identification and synchronizing input 202 may also be provided, particularly in directed scenes, wherein an explicit identification of the scene (e.g. "Rocky IX, Scene 32, Take 3") is available.
  • FIG. 3 illustrates an example block diagram of an indexing/summarizing system 300 in accordance with this invention.
  • an indexer/summarizer 310 preferably has access to a variety of information 215, 320-323 that facilitates the characterization, or indexing, or summarizing, of content information.
  • Closed- caption information 320 is commonly used to characterize content material, such as in the BNE and BNN systems, discussed above.
  • the production information can be used to improve the efficiency or effectiveness of this categorizing process.
  • the scene of FIG. 1 may include dialog that is included in the closed- caption material, but this dialog may merely be provided as either a diversion from the exchange of briefcases, or merely as filler material while the exchange takes place.
  • the indexer/summarizer 310 in a preferred embodiment uses the production information 215 associated with the scene of FIG. 1, for example, to decrease a significance-weighting-factor associated with the closed-caption information 320, while increasing a corresponding significance- weighting-factor associated with the image information 321.
  • the image information 321 in the content material is used to categorize content material based on the visual characteristics of each image.
  • the scene of FIG. 1 might be characterized as "outdoor, group, vehicles, pedestrians", based on simple pattern and context recognition techniques.
  • the indexer/summarizer 310 may also include a recognition of one or more of the actors and actresses in the scene.
  • the object tracking facilitates the synchronization of the directives to the content material, by associating the directive sentences to the current or next scene, depending upon whether "Joe” and "Jim” are in the respective scenes.
  • Context information 322 also facilitates the characterization of the content material that is provided by the indexer/summarizer 310. For example, if the context of a scene is a sports event, the interpretation of the terminology contained in the production information 215 or closed-caption information 320 may be modified; the correlation between the closed-caption information 320 and the image information 321 is modified, based on the likelihood that the closed-caption information corresponds to the broadcaster, and not necessarily to the individuals portrayed in the images; and so on.
  • indexer/summarizer 310 is at a user's home, or is customized for a particular user, user information 323 may also be used to facilitate the characterization of the content material.
  • knowledge of the user's preferences and/or habits facilitates the processing of the production information 215, and other information 320-322, by providing a prioritization for particular aspects of the indexer/summarizer 310. For example, if the user rarely retrieves information based on a performer's name, the indexer/summarizer will devote fewer resources to characterizing or indexing the content information based on recognition of characters via the production information 215, and other information 320-322. Conversely, if the user commonly searches for particular performers, the indexer/summarizer 310 spends additional time and resources to track each performer, using the plurality of information sources 215, 320-322, to provide a comprehensive index of the performers who appear in each scene or clip.
  • the indexer/summarizer 310 provides information that is appended to the content matter, typically as an annotated version of the content material 350.
  • a DVD provider for example, will use the system 300 to provide a DVD that includes the indexing or summarizing information associated with each scene of the content material on the DVD.
  • a corresponding DVD player is configured to facilitate a search for particular scenes in the content material based on the included indexing or summarizing information.
  • the indexing/summarizing system 300 may be independent of the provider of the content material, and may provide the indexing or summarizing information as an independent adjunct to the content material.
  • a vendor may provide this indexing and summarizing information on an Internet site, and may provide an application program that allows a user to effect the aforementioned search via an Internet-access device, such as a Web-TV device, or a personal computer (PC).
  • an Internet-access device such as a Web-TV device, or a personal computer (PC).
  • FIG. 4 illustrates an example block diagram of a production recorder 210 in accordance with this invention.
  • the production recorder 210 includes a speech recognizer 420, a field of view processor 430, and a scene synchronizer 410, for processing the production-related inputs 201-240 from a variety of sources.
  • the speech recognizer 420 provides a translation from spoken sounds to recognized words, and is used to process the vocal inputs 220.
  • the field of view processor 430 provides an interpretation of camera settings to characterize the current field of view of each camera 230.
  • the scene synchronizer 410 processes the synchronizing inputs 201, 202 to facilitate synchronization between the production information and the content material.
  • Other processors 440 are provided as required for processing the input from the other sources 240 of production information.
  • a symbolic encoder 450 encodes the production information 215 in symbolic form, to facilitate subsequent processing by the indexer/summarizer 310 (FIG. 3).
  • the production recorder 210 includes a symbol library 460 that facilitates this symbolic encoding.
  • the symbol library 460 in a preferred embodiment includes symbols for key words that the symbolic encoder 450 uses to encode words that are provided by the speech recognizer 420.
  • the symbol library includes symbols corresponding to particular camera settings or combinations of settings, to facilitate the encoding of the camera characteristics provided by the field of view processor 420.
  • a variety of techniques may be used for maintaining an effective symbol library 460. Copending U.S.
  • the production information 215 that is included in the annotated audio/video content 350 may be of direct interest to the user, and may provide a marketing advantage for media that contains this information.

Abstract

The 'background information' that is available during the production of the content material is correlated with the content material to facilitate a selective access to a source of audio/video content material. This production information includes, for example, the camera settings used during the collection of the content material. Other production information includes sound controls, scene identifiers, source identifiers, instructions that are communicated to the staff that produces the content material, and so on. To facilitate indexing, the production information is processed and filtered to produce a collection of symbolic representations of the production information, each symbol corresponding to a determined event or characteristic. This symbolic production information is preferably combined with other sources of indexing and summarizing information, to provide a set of annotations to the content material that facilitate efficient and effective selective retrieval.

Description

System to Index/Summarize Aud vNideo Content
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to the field of consumer electronics, and in particular to a system that facilitates the indexing and summarizing of audio/video content for efficient search and retrieval of select content.
2. Description of Related Art
Advances are being made continually in the field of automated story segmentation and identification, as evidenced by the BNE (Broadcast News Editor) and BNN (Broadcast News Navigator) of the MITRE Corporation (Andrew Merlino, Daryl Morey, and Mark Maybury, MITRE Corporation, Bedford MA, Broadcast News Navigation using Story Segmentation, ACM Multimedia Conference Proceedings, 1997, pp. 381-389). Using the BNE, newscasts are automatically partitioned into individual story segments, and the first line of the closed-caption text associated with the segment is used as a summary of each story. Key words from the closed-caption text or audio are determined for each story segment. The BNN allows the consumer to enter search words, with which the BNN sorts the story segments by the number of keywords in each story segment that match the search words. Based upon the frequency of occurrences of matching keywords, the user selects stories of interest. Similar search and retrieval techniques are becoming common in the art. For example, conventional text searching techniques can be applied to a computer based television guide, so that a person may search for a particular show title, a particular performer, shows of a particular type, and the like.
These indexing and search techniques are also being developed for recorded entertainment material, such as audio and video CDs and DVDs. Users will be able to search, for example, for a particular scene by specifying a performer's name and a characterization of the scene, such as "monologue, John Smith". In response, the search device will present one or more scenes containing John Smith performing a monologue.
One of the difficulties encountered in categorizing and indexing material for retrieval is the need to "annotate" the material with the relevant information for facilitating an efficient retrieval. A manual process can be utilized to add indexing information to each recorded or broadcast set of content material, but such a process will be a costly endeavor, and thus the need for the above mentioned automated indexing systems, such as BNE. Conventionally, for example, an automated indexing system first recognizes each change of scene, or cut, by searching for frames that are substantially different from its immediately prior frame. Thereafter, if the frame contains a close-up facial shot, and the context of the program is "news broadcast", the following sequence of frames may be identified as a "newscaster" clip, whereas, if the frame contains a full-figure shape, the following sequence of frames may be identified as an "on-location" clip. As noted above, any closed-caption text is also used to identify and categorize scenes. Although these techniques are proving to be somewhat effective, they rely heavily on the content of the material, such as the images or voices in each scene, to determine an appropriate set of indexing parameters to characterize the material.
The MPEG-7 standard addresses the need for effective indexing and search capabilities, and calls for a "Multimedia Content Description Interface" that is intended to standardize a set of description schemes and descriptors, a description definition language, and a scheme for coding the description. In particular, the MPEG-7 standard calls for the ability to associate descriptive information within video streams at various stages of video production, including pre- and post-production scripts, information captured or annotated during shooting, and post-production edit lists. By adding this information during the production of the video material, the quality and effectiveness of the annotations is expected to be substantially improved, compared to an after-production annotation of material in a video archive.
BRIEF SUMMARY OF THE INVENTION It is an object of this invention to improve the effectiveness of content indexing and summarizing systems for audio/video content material. It is a further object of this invention to provide additional ancillary information to facilitate the indexing and summarizing of content material.
These objects and others are achieved by correlating the "background information" that is available during the production of the content material with the content material. This production information includes, for example, the camera settings used during the collection of the content material. Other production information includes sound controls, scene identifiers, source identifiers, instructions that are communicated to the staff that produces the content material, and so on. Instructions from a director to each camera operator, for example, can provide insight into the content of the subsequent images from the cameras. In like manner, automatically generated instructions from automated camera systems can also provide insight. Other production information includes sound controls, scene identifiers, source identifiers, and so on. To facilitate indexing, the production information is processed and filtered to produce a collection of symbolic representations of the production information, each symbol corresponding to a determined event or characteristic. This symbolic production information is preferably combined with other sources of indexing and summarizing information, to provide a set of annotations to the content material that facilitate efficient and effective selective retrieval. The techniques presented herein are also particularly well suited for annotating the content material of videoconferences, to facilitate identification of key segments of a videoconference recording.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
FIG. 1 illustrates an example production scene for gathering audio/video information.
FIG. 2 illustrates an example block diagram of a production recording system in accordance with this invention. FIG. 3 illustrates an example block diagram of an indexing/summarizing system in accordance with this invention.
FIG. 4 illustrates an example block diagram of a production recorder in accordance with this invention.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.
DETAILED DESCRIPTION OF THE INVENTION
For ease of reference and understanding, the terms "indexing" and "summarizing" are used herein to reference particular applications for this invention. This invention addresses a method and device for providing information that is associated with content material, and is not limited by how that information is used. Although this provided information may be particularly well suited for use as an index to facilitate a search for material, or for use as a synopsis to facilitate a quick review or preview of the material, one of ordinary skill in the art will recognize that this invention is not limited to these particular applications.
FIG. 1 illustrates an example production scene for gathering audio/video information. The scene includes a director 110 who directs the action of the camera operators 120, 121, as well as the performers 130, 131 and objects 140. The example scene is representative of a directed scene. As will be evident to one of ordinary skill in the art, this invention is also applicable to an undirected scene, such as the recording of a news or sports event. In a live sports event, for example, a production director continuously determines which camera is the "on-line" camera, depending upon the current action, directs "off-line" cameras to capture specific scenes for potential selection as the "on-line" camera, or potential use for "instant replay". In a news broadcast, the production director also performs a similar selection of source material for transmission. In a videoconference, participants at either site can adjust the camera setting, typically in accord with the videoconference activities. As can be imagined, the director 110 of FIG. 1 may issue instructions, such as "Camera 1, follow Joe (130) as he runs into the crowd. Camera 2, follow Jim (131). Make sure you both have the briefcases in view when they switch them." This invention is premised on the observation that such "behind-the-scenes" commands, which form a portion of the production information corresponding to the scene, convey a substantial amount of information for interpreting images of scenes. For example, the three directive sentences, above, impart a meaning to the scene of FIG. 1 that would be difficult to deduce based solely on the image content. And, if there is no dialog associated with the scene of FIG. 1, the use of closed-caption text will provide minimal assistance in discerning this meaning. The published scripts, synopses, and scene edit lists, cited above with regard to MPEG-7, may contain appropriately descriptive information, but the published information may not reflect the events as they actually happen on the production site. On the other hand, the three example sentences, above, convey the information that Joe and Jim are in the scene, that this is the scene wherein briefcases are switched, and so on. This production information also conveys information that can be used in an indirect manner to facilitate an interpretation of other sources of information. For example, an image processing system will likely identify this scene as a "group" scene, and may or may not identify Joe or Jim in the group, whereas the example sentences imply that the crowd is merely "background". In like manner, the presence of the automobiles 140 in the scene can also be interpreted as insignificant background information, based on the absence of a reference to them in the production directives. That is, the production information not only conveys direct information regarding the content of the subsequent images, but also provides cues that facilitate efficient processing by other sources of indexing or summarizing information.
FIG. 2 illustrates an example block diagram of a production recording system 200 in accordance with this invention. A production recorder 210 receives information from a variety of sources 220, 230, 240, and produces a database of production information 215 that captures the production information in an efficient form for subsequent processing. A primary source of production information is vocal input 220. A variety of vocal sources 220 typically provide production information. For example, in a news broadcast, the production booth provides a source of vocal information; the on-site reporter may give directions to an on-site camera operator; before the on-air transmission, the news coordinator may give advice to the studio newscasters, or the on-site reporter, and so on.
The production recorder 210 is configured to process each of these sources of information, to extract pertinent information, and to record the pertinent information for subsequent use in a retrieval process. Depending upon the complexity of the analysis, this processing may be performed in real time, as the images are being recorded, or as a post-processing task. In a preferred embodiment, the production recorder 210 also directly records the information from the sources 220-240, to facilitate a subsequent selectable degree of processing and analysis. The real-time analysis of the production information, for example, uses the 'primary' source of production information, such as the vocal information from a microphone associated with the director, and the other sources of production information are processed as required at a later time, for a more detailed analysis and assessment. In addition to vocal commands, a production recording system 200 in a preferred embodiment includes input 230 corresponding to parameters associated with the individual cameras. For example, the zoom setting of a camera can be used to provide a characterization of a scene. A narrow-angle, high-zoom setting typically implies a directed focus to an individual or event; or, a change of zoom setting indicates a change of emphasis on the individual or event. A wide-angle, or low-zoom setting is typically associated with a "background" or "mood-setting" scene. The identification of a series of images being captured at a low zoom setting can be used, for example, by an image-based categorizing system to "skip ahead" to higher zoom setting images, based on this likely correlation of significance and zoom setting. Or, in a learning system, each new scene may be assessed, regardless of zoom setting; if the image processor finds little or no discernible information at a particular zoom setting, it may then activate a 'skip-ahead' function. Assuming a consistency in video capture techniques, particularly within the same production, the decision to 'skip-ahead' can be made more and more quickly, thereby improving the efficiency of the image-based categorizing system. In like manner, the orientation, or rate of change of orientation can also be used to characterize a scene. For example, in sports, the capture of a kick-off, forward-pass, or home-run would likely involve a relatively rapid change of camera orientation, whereas an end-line-rush or strike-out would likely not involve a change of camera orientation. Other sources 240 of production information, such as the location of sound booms during recording, and other means of identifying the 'focus' of the scene can also be used to facilitate an indexing or summarizing of recorded content material. In like manner, the 'source' of the content material can convey information regarding the scene. For example, in a news broadcast, the identification of a scene as coming from "file footage" can be used to either minimize the processing of the scene, or provide a link to a prior processing and characterizing of this footage. Note that as production tasks become more and more automated, or at least managed via computer resources, the source of production information becomes substantial. For example, a sequencing and selection of sources during a newscast can be expected to be controlled via a computer. The capture of this information for each production can substantially increase the effectiveness and efficiency of other content indexing and summarizing tools.
Note that the choice of sources 220-240 of production information is discretionary, and need not conform to traditional production techniques. For example, copending US patent application "HANDS-FREE HOME VIDEO PRODUCTION CAMCORDER", serial number 09/532,820, filed 3/21/00 for Mi-Suen Lee, Attorney Docket US000063, discloses a technique and device that automatically adjusts a camera's field of view to capture scenes of likely interest, and is incorporated by reference herein. The adjustment is based, for example, on object tracking techniques, sound localization and focusing, etc. and incorporates knowledge-based system techniques to emulate the actions of an experienced camera operator. As used with this invention, the resultant automatically generated camera settings provide the camera input 230 to the production recorder 210, discussed above, with or without any other sources of production information. In like manner, copending US patent application "MULTIMODAL VIDEO TARGET ACQUISITION AND RE-DIRECTION SYSTEM AND METHOD", serial number 09/488,028, filed 1/20/00 for Eric Cohen-Solal and Mi-Suen Lee, Attorney Docket US000015, discloses a technique and device that adjusts a camera's field of view based on gestures and key words, and is incorporated by reference herein. Similarly, copending US patent applications "METHOD AND APPARATUS FOR TRACKING MOVING OBJECTS USING COMBINED VIDEO AND AUDIO INFORMATION IN VIDEO
CONFERENCING AND OTHER APPLICATIONS", serial number 09/548,734, filed 4/13/00 for Hugo Strubbe and Mi-Suen Lee, Attorney Docket US000103, discloses a technique and device that adjusts a camera's field of view based on an analysis of the video and audio content of the scene being recorded. As used with this invention, the resultant camera settings, and the gestures, speech, and movements used to cause these settings, or the analysis of the gestures, speech, or movements, may be provided to the production recorder 210 for use in generating the production information database 215.
The camera settings during a videoconference can similarly be used to facilitate a characterization of a videoconference session. Advanced videoconferencing systems are expected to include the automated and semi-automated camera control features discussed above, and even relatively simple systems allow the participants to the videoconference to adjust the camera field of view, either at their site, or at the remote site. Or, a camera operator may be provided at a central site of the videoconference, or at the site of a key speaker, and so on. A stationary camera position, with perhaps some variations in zoom, for an extended time duration could be indicative of a keynote address, particularly when correlated to the audio content from each videoconference site. In like manner, a continuing back and forth rotation of the camera could be indicative of a key discussion period. As noted above, this combination of this production information with an analysis of the content material can provide insights that may not be readily apparent from the content material, and thereby improve the quality and efficiency of providing summaries for each videoconference. For example, once the camera setting corresponding to each of a plurality of speakers is determined (or explicitly provided), character identification in images becomes significantly simpler if the camera setting corresponding to the images is used to pre-filter the choices of characters that are provided to the character identification process. Similarly, the speaker identification process can be similarly enhanced by providing camera settings corresponding to each audio track. In like manner, a rapid determination that the audio track does not match the identified participant in the field of view corresponding to the current camera settings can be used to modify the current camera settings to search for the current speaker. These and other synergistic effects will be evident to one of ordinary skill in the art as the use of this invention becomes commonplace.
To facilitate synchronization of the production information with the content material, a time reference 201 is associated with the recorded production information 215. As would be obvious to one of ordinary skill in the art, some of the production information, such as the camera settings 240, will be coincident in time with the scenes of the content material, while other information, such as vocal directives 220 typically precede the scenes to which they apply. Knowledge based and heuristic techniques are used to determine a correlation between specific directives 220 and the content material. For example, a directive 220 that is followed with a significant adjustment of camera settings 230, or the occurrence of a 'cut' in scenes, is likely to contain information relevant to the following clip. Otherwise, for example, if there is no significant change in the other inputs to the production recorder 210, or to the content material, the directives are likely to be relevant to the current clip. Other techniques for determining cause and effect relationships are common in the art. Other scene identification and synchronizing input 202 may also be provided, particularly in directed scenes, wherein an explicit identification of the scene (e.g. "Rocky IX, Scene 32, Take 3") is available.
FIG. 3 illustrates an example block diagram of an indexing/summarizing system 300 in accordance with this invention. As illustrated, an indexer/summarizer 310 preferably has access to a variety of information 215, 320-323 that facilitates the characterization, or indexing, or summarizing, of content information. Closed- caption information 320 is commonly used to characterize content material, such as in the BNE and BNN systems, discussed above. In accordance with this invention, the production information can be used to improve the efficiency or effectiveness of this categorizing process. For example, the scene of FIG. 1 may include dialog that is included in the closed- caption material, but this dialog may merely be provided as either a diversion from the exchange of briefcases, or merely as filler material while the exchange takes place. The indexer/summarizer 310 in a preferred embodiment uses the production information 215 associated with the scene of FIG. 1, for example, to decrease a significance-weighting-factor associated with the closed-caption information 320, while increasing a corresponding significance- weighting-factor associated with the image information 321.
The image information 321 in the content material is used to categorize content material based on the visual characteristics of each image. For example, the scene of FIG. 1 might be characterized as "outdoor, group, vehicles, pedestrians", based on simple pattern and context recognition techniques. Depending upon the system's capabilities, the indexer/summarizer 310 may also include a recognition of one or more of the actors and actresses in the scene. Copending U.S. patent application, "PROGRAM CLASSIFICATION USING OBJECT TRACKING", serial number 9/452,581, filed 12/1/99 for Nevenka Dimitrova and Lalitha Agnihotri, Attorney Docket PHA 23,846, discloses a content-based classification system that detects the presence of facial images and text images within a frame and determines the path, or trajectory, of each image through multiple frames of the video segment. The combination of face trajectory and text trajectory information is used to classify each segment of a video sequence. In the example of FIG. 1, the production information 215 facilitates this object tracking, based, for example, on the references to "Joe" and "Jim" in the directive sentences. In like manner, the object tracking facilitates the synchronization of the directives to the content material, by associating the directive sentences to the current or next scene, depending upon whether "Joe" and "Jim" are in the respective scenes. These and other techniques of combining production information with image information to facilitate a characterization of scenes or clips will be evident to one of ordinary skill in the art in view of this disclosure. The aforementioned MPEG-7 standardization effort is expected to provide useful semantic descriptors for efficient and effective indexing and summarizing of audio/video content material, as well as the syntax required to facilitate cross-platform utilization of this information.
Context information 322 also facilitates the characterization of the content material that is provided by the indexer/summarizer 310. For example, if the context of a scene is a sports event, the interpretation of the terminology contained in the production information 215 or closed-caption information 320 may be modified; the correlation between the closed-caption information 320 and the image information 321 is modified, based on the likelihood that the closed-caption information corresponds to the broadcaster, and not necessarily to the individuals portrayed in the images; and so on.
If the indexer/summarizer 310 is at a user's home, or is customized for a particular user, user information 323 may also be used to facilitate the characterization of the content material. Copending U.S. patent application, "PERSONALIZED NEWS RETRIEVAL SYSTEM", serial number 09/220,277, filed 12/23/98 for Jan H. Elenbaas, Tomas McGee, Nevenka Dimitrova, and Mark Simpson, Attorney Docket PHA 23,590 presents techniques for customizing the categorizing and retrieval of information based on a user's preferences or viewing habits, and is incorporated by reference herein". In the context of this application, knowledge of the user's preferences and/or habits facilitates the processing of the production information 215, and other information 320-322, by providing a prioritization for particular aspects of the indexer/summarizer 310. For example, if the user rarely retrieves information based on a performer's name, the indexer/summarizer will devote fewer resources to characterizing or indexing the content information based on recognition of characters via the production information 215, and other information 320-322. Conversely, if the user commonly searches for particular performers, the indexer/summarizer 310 spends additional time and resources to track each performer, using the plurality of information sources 215, 320-322, to provide a comprehensive index of the performers who appear in each scene or clip. These and other techniques for optimizing the efficiency and effectiveness of the indexer/summarizer 310 will be evident to one of ordinary skill in the art in view of this disclosure.
The indexer/summarizer 310 provides information that is appended to the content matter, typically as an annotated version of the content material 350. A DVD provider, for example, will use the system 300 to provide a DVD that includes the indexing or summarizing information associated with each scene of the content material on the DVD. A corresponding DVD player is configured to facilitate a search for particular scenes in the content material based on the included indexing or summarizing information. Alternatively, the indexing/summarizing system 300 may be independent of the provider of the content material, and may provide the indexing or summarizing information as an independent adjunct to the content material. For example, a vendor may provide this indexing and summarizing information on an Internet site, and may provide an application program that allows a user to effect the aforementioned search via an Internet-access device, such as a Web-TV device, or a personal computer (PC).
FIG. 4 illustrates an example block diagram of a production recorder 210 in accordance with this invention. In a preferred embodiment, the production recorder 210 includes a speech recognizer 420, a field of view processor 430, and a scene synchronizer 410, for processing the production-related inputs 201-240 from a variety of sources. The speech recognizer 420 provides a translation from spoken sounds to recognized words, and is used to process the vocal inputs 220. The field of view processor 430 provides an interpretation of camera settings to characterize the current field of view of each camera 230. The scene synchronizer 410 processes the synchronizing inputs 201, 202 to facilitate synchronization between the production information and the content material. Other processors 440 are provided as required for processing the input from the other sources 240 of production information.
In a preferred embodiment, a symbolic encoder 450 encodes the production information 215 in symbolic form, to facilitate subsequent processing by the indexer/summarizer 310 (FIG. 3). The production recorder 210 includes a symbol library 460 that facilitates this symbolic encoding. For example, the symbol library 460 in a preferred embodiment includes symbols for key words that the symbolic encoder 450 uses to encode words that are provided by the speech recognizer 420. In like manner, the symbol library includes symbols corresponding to particular camera settings or combinations of settings, to facilitate the encoding of the camera characteristics provided by the field of view processor 420. A variety of techniques may be used for maintaining an effective symbol library 460. Copending U.S. patent application "IMAGE CLASSIFICATION USING EVOLVED PARAMETERS", serial number 09/343,649, filed 6/29/99 for Keith Mathias, J. David Schaffer, and Murali Mani, Attorney Docket PHA 23,696, discloses the use of genetic, or evolutionary, algorithms for optimizing the parameters used to classify images, and is incorporated by reference herein.
The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within its spirit and scope. For example, video content information has been used as the paradigm for presenting the application of this invention, although the categorizing and indexing of other types of content information may also be facilitated by the inclusion of production information. The directives and equipment settings in a sound studio, for example, may be used to index or summarize audio content material. Similarly, in addition to its use for indexing and retrieval, the production information 215 that is included in the annotated audio/video content 350, particularly vocal comments of the director, may be of direct interest to the user, and may provide a marketing advantage for media that contains this information. These and other system configuration and optimization features will be evident to one of ordinary skill in the art in view of this disclosure, and are included within the scope of the following claims.

Claims

CLAIMS:
1. A method of providing ancillary information related to content material including: collecting source information (220-240) related to production of the content material, and processing the source information (220-240) to provide the ancillary information, wherein the source information (220-240) includes at least one of: one or more directives (220) issued during the production of the content material, and one or more parameters (230, 240) that are associated with one or more items of equipment used during the production of the content material.
2. The method of claim 1, wherein processing the source information (220-240) includes at least one of: recognizing words associated with a vocal input corresponding to the one or more directives (220), and recognizing a field of view setting associated with at least one camera (230) corresponding to the one or more items of equipment.
3. The method of claim 1 , wherein processing the source information (220-240) also includes processing other information (320-323) to provide the ancillary information, the other information (320-323) including at least one of: closed-caption information (320) associated with the content material, image information (321) associated with the content material, audio information associated with the content material, context information (322) associated with the content material, and user information (323) associated with a user of the content material.
4. The method of claim 3 , wherein the processing of the source information (220-240) facilitates summarizing the content material.
5. The method of claim 1 , wherein the content material corresponds to scenes of a videoconference.
6. The method of claim 1 , further including synchronizing the ancillary information with the content material, to facilitate a search for particular segments of the content material (350), based on the ancillary information.
7. The method of claim 1 , further including storing production information (215) associated with the source information
(220-240) in symbolic form, to facilitate the processing of the source material.
8. The method of claim 1 , wherein the ancillary information is provided consistent with an MPEG-7 specification.
9. A method of providing ancillary information related to content material including: collecting camera parameters (230) that are associated with one or more camera settings used during the production of the content material, and producing the ancillary information, based on the camera parameters (230).
10. The method of claim 1 , wherein providing the ancillary information also includes processing other information (320-323) to provide the ancillary information, the other information (320-323) including at least one of: closed-caption information (320) associated with the content material, image information (321) associated with the content material, audio information associated with the content material, context information (322) associated with the content material, and user information (323) associated with a user of the content material.
11. A recording system (210) comprising: a encoder (450) that accepts as input: source information (220-240) associated with a production of content material, synchronizing data (201, 202) associated with the content material, and produces therefrom production information (215) that facilitates selective access to the content material.
12. The recording system (210) of claim 11, wherein the source information (220-240) includes at least one of: directives (220) associated with the production of the content material, and settings (230-240) associated with equipment used for the production of the content material.
13. The recording system (210) of claim 11 , wherein the encoder (450) comprises at least one of: a speech recognition system (420) that processes vocal source information (220) associated with the production of the content material, and a field of view processor (430) that processes parameters (230) associated with at least one camera that is associated with the production of the content material.
14. An information processing system (300) comprising: a source of production information (215) that is related to production of content material, a processor (310), operably coupled to the source of production information (215), that is configured to provide ancillary information related to the content material, wherein the production information (215) includes at least one of: one or more directives (220) issued during the production of the content material, and one or more parameters (230, 240) that are associated with one or more items of equipment used during the production of the content material.
15. The information processing system (300) of claim 14, wherein the source of production information (215) includes at least one of: a speech recognition system (420) that is configured to recognize words associated with a vocal input corresponding to the one or more directives (220), and a field of view processor (430) that processes parameters (230) associated with at least one camera corresponding to the one or more items of equipment.
16. The information processing system (300) of claim 14, further including at least one source of other information (320-323), the other information (320-323) including at least one of: closed-caption information (320) associated with the content material, image information (321) associated with the content material, context information (322) associated with the content material, and user information (323) associated with a user of the content material; wherein the processor (310) is operably coupled to the at least one source of other information (320-323), and is further configured to provide the ancillary information based also on the other information (320-323).
17. The information processing system (300) of claim 14, further including a synchronizer (410) that is configured to provide a correlation between the ancillary information and the content material.
18. The information processing system (300) of claim 14, wherein the ancillary information facilitates an identification of characters associated with the content material.
19. The information processing system (300) of claim 14, wherein the ancillary information is provided in accordance with an MPEG-7 specification.
20. An information processing system (300) comprising: an input for receiving production information (215) that is related to production of content material, a processor (310), operably coupled to the input, that is configured to provide ancillary information related to the content material, wherein the production information (215) includes one or more parameters (230) that are associated with one or more cameras used during the production of the content material.
21. The information processing system (300) of claim 40, further including at least one source of other information (320-323), the other information (320-323) including at least one of: closed-caption information (320) associated with the content material, image information (321) associated with the content material, context information (322) associated with the content material, and user information (323) associated with a user of the content material; wherein the processor (310) is operably coupled to the at least one source of other information (320-323), and is further configured to provide the ancillary information based also on the other information (320-323).
PCT/EP2001/009974 2000-09-11 2001-08-27 System to index/summarize audio/video content WO2002021843A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP01978317A EP1393568A2 (en) 2000-09-11 2001-08-27 System to index/summarize audio/video content
JP2002526124A JP2004508776A (en) 2000-09-11 2001-08-27 System for indexing / summarizing audio / image content
KR1020027006025A KR20020060964A (en) 2000-09-11 2001-08-27 System to index/summarize audio/video content

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US65918200A 2000-09-11 2000-09-11
US09/659,182 2000-09-11

Publications (2)

Publication Number Publication Date
WO2002021843A2 true WO2002021843A2 (en) 2002-03-14
WO2002021843A3 WO2002021843A3 (en) 2003-12-18

Family

ID=24644380

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/009974 WO2002021843A2 (en) 2000-09-11 2001-08-27 System to index/summarize audio/video content

Country Status (4)

Country Link
EP (1) EP1393568A2 (en)
JP (1) JP2004508776A (en)
KR (1) KR20020060964A (en)
WO (1) WO2002021843A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004021221A2 (en) * 2002-08-30 2004-03-11 Hewlett-Packard Development Company, L.P. System and method for indexing a video sequence
EP1684198A3 (en) * 2005-01-20 2006-08-09 Samsung Electronics Co., Ltd. Digital photo managing apparatus and method, and computer recording medium storing program for executing the method
US20080052104A1 (en) * 2005-07-01 2008-02-28 Searete Llc Group content substitution in media works
US20080086380A1 (en) * 2005-07-01 2008-04-10 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Alteration of promotional content in media works
WO2008141539A1 (en) * 2007-05-17 2008-11-27 Huawei Technologies Co., Ltd. A caption display method and a video communication system, apparatus
CN102289490A (en) * 2011-08-11 2011-12-21 杭州华三通信技术有限公司 Video summary generating method and equipment
CN105096668A (en) * 2014-05-16 2015-11-25 北京天宇各路宝智能科技有限公司 Teaching voice and video manufacturing system and manufacturing method
US9583141B2 (en) 2005-07-01 2017-02-28 Invention Science Fund I, Llc Implementing audio substitution options in media works

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5136655A (en) * 1990-03-26 1992-08-04 Hewlett-Pacard Company Method and apparatus for indexing and retrieving audio-video data
WO1998057251A1 (en) * 1997-06-13 1998-12-17 Panavision, Inc. Multiple camera video assist control system
WO1999036863A2 (en) * 1998-01-13 1999-07-22 Koninklijke Philips Electronics N.V. System and method for selective retrieval of a video sequence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5136655A (en) * 1990-03-26 1992-08-04 Hewlett-Pacard Company Method and apparatus for indexing and retrieving audio-video data
WO1998057251A1 (en) * 1997-06-13 1998-12-17 Panavision, Inc. Multiple camera video assist control system
WO1999036863A2 (en) * 1998-01-13 1999-07-22 Koninklijke Philips Electronics N.V. System and method for selective retrieval of a video sequence

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004021221A2 (en) * 2002-08-30 2004-03-11 Hewlett-Packard Development Company, L.P. System and method for indexing a video sequence
WO2004021221A3 (en) * 2002-08-30 2004-07-15 Hewlett Packard Development Co System and method for indexing a video sequence
US7483624B2 (en) 2002-08-30 2009-01-27 Hewlett-Packard Development Company, L.P. System and method for indexing a video sequence
EP1684198A3 (en) * 2005-01-20 2006-08-09 Samsung Electronics Co., Ltd. Digital photo managing apparatus and method, and computer recording medium storing program for executing the method
US20080052104A1 (en) * 2005-07-01 2008-02-28 Searete Llc Group content substitution in media works
US20080086380A1 (en) * 2005-07-01 2008-04-10 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Alteration of promotional content in media works
US9583141B2 (en) 2005-07-01 2017-02-28 Invention Science Fund I, Llc Implementing audio substitution options in media works
WO2008141539A1 (en) * 2007-05-17 2008-11-27 Huawei Technologies Co., Ltd. A caption display method and a video communication system, apparatus
CN102289490A (en) * 2011-08-11 2011-12-21 杭州华三通信技术有限公司 Video summary generating method and equipment
CN105096668A (en) * 2014-05-16 2015-11-25 北京天宇各路宝智能科技有限公司 Teaching voice and video manufacturing system and manufacturing method

Also Published As

Publication number Publication date
EP1393568A2 (en) 2004-03-03
JP2004508776A (en) 2004-03-18
WO2002021843A3 (en) 2003-12-18
KR20020060964A (en) 2002-07-19

Similar Documents

Publication Publication Date Title
Zhang et al. An integrated system for content-based video retrieval and browsing
Dimitrova et al. Applications of video-content analysis and retrieval
US8528019B1 (en) Method and apparatus for audio/data/visual information
CA2924065C (en) Content based video content segmentation
KR100915847B1 (en) Streaming video bookmarks
EP1692629B1 (en) System & method for integrative analysis of intrinsic and extrinsic audio-visual data
US8238718B2 (en) System and method for automatically generating video cliplets from digital video
KR20010041194A (en) Personalized video classification and retrieval system
US20050028194A1 (en) Personalized news retrieval system
Chen et al. Detection of soccer goal shots using joint multimedia features and classification rules
Roach et al. Recent Trends in Video Analysis: A Taxonomy of Video Classification Problems.
KR20030023576A (en) Image information summary apparatus, image information summary method and image information summary processing program
US8051446B1 (en) Method of creating a semantic video summary using information from secondary sources
WO2002021843A2 (en) System to index/summarize audio/video content
Wang et al. Automatic composition of broadcast sports video
Chen et al. Multi-criteria video segmentation for TV news
Volkmer et al. Gradual transition detection using average frame similarity
Hu et al. Combined-media video tracking for summarization
Papachristou et al. Human-centered 2D/3D video content analysis and description
Aas et al. A survey on: Content-based access to image and video databases
Dimitrova et al. Selective video content analysis and filtering
Dimitrova et al. PNRS: personalized news retrieval system
Muhammad et al. Content based identification of talk show videos using audio visual features
Evans Teaching computers to watch television: Content-based image retrieval for content analysis
JP2005530267A (en) Stored programs and segment precicipation / dissolution

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2002 526124

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020027006025

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 1020027006025

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2001978317

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2003107921

Country of ref document: RU

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 2001978317

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2001978317

Country of ref document: EP